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(57) Abstract 

A human motion following controller is provided by the 
invention to augment motion of items (e.g., computer cursor or 
scene view) shown on a computer display. The display is coupled 
to the computer which controls positioning of the items through 
operating system controls. A camera captures frames of data 
corresponding to a first image of at least part of a user (e.g., eyes, 
hands) at the computer display. Signal processing electronics 
coupled to the camera (a) detects differences between successive 
frames of data corresponding to motion of the first image, and 
(b) communicates differences information to the computer to 
reposition display of the items through the operating system 22 
controls. The items are thus repositioned on the display by an 
amount corresponding to the motion of first image. 
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HUMAN MOTION FOLLOWING COMPUTER MOUSE AND GAME 

CONTROLLER 

A portion of the disclosure of this patent document contains material which is 
5 subject to copyright protection. The copyright owner has no objection to the 
facsimile reproduction of any one of the patent document or the patent disclosure, as 
it appears in the Patent and Trademark Office patent file or records, but otherwise 
reserves all copyright rights whatsoever. 

Related Applications 

This application is a continuing application of commonly-owned and co- 
pending U.S. provisional application number 60/070,512, filed on January 6, 1998, 
and U.S. provisional application number 60/100,046, filed on September 11, 1998, 
each of which is incorporated herein by reference. 

15 

Background 

The primary human interfaces to today's computer are the keyboard, to enter 
textual information, and the mouse, to provide control over graphical information. 
20 These interfaces help users with word processing, presentation software, computer 
aided design packages, spreadsheet analyses, and other applications. These 
interfaces are also widely used for computer gaming entertainment; though they are 
often augmented or replaced by a joystick. 

25 In daily use of business software applications, control of cursor position on 

the screen requires that the user remove his/her hand from the keyboard in order to 
use the standard mechanical mouse. The use of the mouse introduces several issues. 
In a desk environment, the mouse requires maintenance of space on the desk area. 
The mouse cord must also remain free from obstruction to facilitate movement. 

30 Additionally, the use of the mouse is a major contributing factor of carpal-tunnel 

l 



10/26/2006, EAST Version: 2,1.0.14 



WO 99/35633 PCT/US99/00086 

syndrome. It would be advantageous therefore to find an alternative to the 
mechanical mouse. 



In computer gaming, game complexity generally requires control of the (i) 
5 mouse and keyboard, or (ii) joystick and keyboard. Further, gaming applications 
usually require control in several axes of motion, including forward motion, reverse 
motion, left turn, right turn, left strafe (slide), right strafe, upward motion, 
downward motion. To further complicate game maneuvers and control, many 
games permit viewing (within the game environment) in directions different from 
10 . that in which the vehicle (e.g., the car, or person, simulated within the game) is 
moving, including up, down, left and right. These many complexities of motion in- 
fact increase or modify the complexity and enjoyment of the game. 

Nevertheless, these complexities require that the user have utmost dexterity 
15 and control of his/her body. One object of the invention, therefore, is to offer 
alternative approaches to human-computer interfaces for those incapable of using 
standard devices (e.g., mouse, keyboard and joystick) such as due to disability. 

Another object of the invention is to provide an alternative input device for 
20 laptop computers. Laptop computers are used in locations which do not allow the 
use of a mouse, in airplanes or during business meetings in which there is no room 
to operate the mouse. Through the use of either a clip on camera or a camera built 
into the laptop display, the laptop user can control the mouse position or use the 
camera for teleconferencing while on the road. 

25 

Other objects of the invention are to replace or augment existing human 
computer interfaces to facilitate enhanced gaming and/ or control within game 
environments. 

30 In the prior art, certain systems exist which attempt to reduce the amount of 

physical interaction required with game controllers. However, such systems are 

2 
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prohibitively expensive to the general public as their costs are driven by techniques 
and algorithms which detect user head motion based upon a detectable target worn 
by the user. Other costly and cumbersome systems require the user to wear 
apparatus which emits or detects a signal. It is thus one other object of this invention 
5 to provide a system which detects user motion without the aid or augmentation of 
artificial devices placed on the user operator. 

Another object of the invention is to provide a means of human control of a 
graphical computer interface through the physical motion of the user in order to 
10 control the activity of a cursor in the manner usually accomplished with a computer 
mouse. 

A further object of the invention is to provide additional degrees of freedom 
in the human computer interface in support of computer games and entertainment 
15 software. 

Yet another object of the invention is to provide dual use of teleconferencing 
and video electronics with gaming and computer control systems. 

20 These and other objects will be apparent in the description which follows. 

Summary of Invention 

As used herein, "cursor" means a computer cursor associated with a 
25 computer screen. "Scene view" means the view presented on a computer display to 
a user. For example, one scene view corresponds to the scene presented to a user 
during a computer game at any given moment in time. The game might include 
displaying a scene whereby the user appears to be walking in a forest, and through 
trees. In another example, a cursor might also be visible in the scene view as a 
30 mechanism for the user to select certain events or items on the scene (e.g., to open a 
door in a game, or to open a folder to access computer files). 

3 
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As used herein, "camera" refers to a solid state instrument used in imaging. 
Typically, the camera also includes optical elements which refract light to form an 
image on the camera's detector elements (typically CCD or CMOS). For example, 
5 one camera of the invention derives from a video-conferencing camera used in 
conjunction with Internet communication. 

In one aspect, the invention provides systems and methods to control 
computer cursor position (or, for example, the scene view or game position as 
10 displayed on the computer display) by motion of the user at the computer. A 

* 

camera rests on or near to the computer, or built into the computer, and connects 
therewith to collect "frames" of data corresponding to images of the user. These 
images provide information about user motion, over time. Software within the 
computer assesses these frames and algorithmically adjusts cursor motion (or scene 

15 view, or mouse button, or some other operation of the computer) based upon this 
motion. The motion may be imparted by up-down or left-right motion of the user's 
head, by the user's hands, or by other motions presented to the video camera (such 
as discussed herein). In one aspect, a close up view of the users facial features is 
used to impart a translation in the cursor (or scene view) even through the features 

20 in fact rotate with the user's head. In yet another aspect, the rotation is used to 
generate a corresponding rotation in computer game scene imagery. 

In one aspect, the invention also provides a human factors approach to cursor 
movement in which the user's rate of motion determines the relative motion of the 
25 cursor (or scene view). By way of example, the faster the user's head travels over a 
set distance, the further the corresponding cursor movement over the same time 
period. 

In other aspects of the invention, the camera is either (a) a visible light camera 
30 utilizing ambient lighting conditions or (b) a camera sensitive in another band such 
as the near infrared ("IR"), the IR, or the ultraviolet ("UV") spectrum. In the latter 

4 
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case (b), the illumination preferably emanates from a source such as an IR lamp 
which is beyond human sensory perception. The sensor is typically mounted facing 
the user so as to capture a picture of the user's face in the associated electromagnetic 
spectrum. The lamp is typically integrated with the camera housing so as to facilitate 
5 production and ease of consumer set-up. 

In one aspect, a system of the invention provides an IR camera (i.e., a camera 
which images infrared radiation) to image the user's face and to gauge the user's 
stress level associated with a game on the computer. As the user's intensity increases 

10 (such as in a fast moving computer game using a joystick or the methods discussed 
herein), the system detects increased heat intensity on the user's face, forehead or 
other body part by the imagery of the IR camera. This information is fed back into 
the game processor to provide further enhancement to the game. In this manner, the 
system gauges the user's reaction to the game and modifies game speed or operation 

15 in a meaningful way. For example, suppose such a system determined that a 
particular user was bored of the present game speed (a determination of boring can 
be made by assessing low IR output over large portions of the user's face). The 
computer processor and game software can then cooperate to increase the gaming 
speed and thereby increase this particular user's stress. Games of the invention are 

20 thus made and sold to users with varying intelligence, age and/or computer 
familiarity; and yet the system always "pushes the envelope" for any given user so 
as to make the game as interesting as possible, automatically. 

In accord with one aspect of the invention, images captured by the sensor are 
25 processed by a digital signal processor ("DSP") located either (a) in a PC card within 
the host computer or (b) in a housing integrated with the sensor. In case (a), sensor 
frames are sent to the PC card; and detected user motion (sometimes denoted herein 
as "difference information") is communicated to the user's operating system via a 
PCI (or USB or later standard) bus interface. These difference information 
30 commands are interpreted by a low overhead program resident at the user's main 
processor, which either updates the cursor position on the screen or provides motion 
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information to the user's computer game (e.g., so as to change the scene view). In 
case (b), the DSP is contained within the camera housing; and frames are processed 
local to the camera to determine difference information. This information is then 
transmitted to the computer by a cable that connects to a bus port of the computer so 
5 that the host processor can make appropriate movements of the cursor or scene 
view. In another aspect, the DSP is mounted in the camera housing such that the 
camera/ signal processing subsystem produces signals which emulate the mouse via 
the mouse input connector. 

10 In an alternative configuration, frames of image data are sent directly to the 

host computer through the computer bus; and that image data is manipulated by the 
computer processor directly. With increasing computer processing speed, it is 
expected that sensor data frames can be sent directly to the host processor for all 
processing needs, in which case the PC card and/ or separate DSP are not required. 

15 Although this is possible today, the update rates are likely too slow for practicality. 
Once GHz processors are on the market, a separate DSP may no longer be needed. 

* 

In one aspect of the invention, pixel format or pixel density of the camera 
drives the accuracy of the system. Higher pixel density in the image of the user's 

20 face, for example, increases the attainable resolution and cursor control (or the 
attainable control of scene view motion). Camera formats of 240 vertical by 320 
horizontal generally provide satisfactory performance. The number of pixels that 
may be utilized is determined by system cost factors. Greater numbers of pixels 
require more powerful DSPs (and thus more costly DSPs) in order to process the 

25 image sequences in real time. Current technology limits the processing density to a 
64x64 window for consumer electronics. As prices are reduced, and power 
increases, the densities can increase to 128x128, 256x256 and so on. While 64x64 
density is satisfactory for general household users, a higher fidelity system using a 
greater number of pixels is possible, in accord with the invention, for higher end 

30 applications at a proportionally higher cost. Non-square pixel formats are also 
possible in accord with the invention, including a 64x128 detector array size. 

6 
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In one aspect, the data transfer rate from the camera is 30 frames/ second at 
240x320 pixels per frame. Assuming eight bits per pixel, the digital data transfer rate 
is therefore 18.432 megabits/second. This is a fairly high transfer rate for consumer 
5 products using current technology. While the data transfer can be either analog or 
digital, the preferred method of image data transfer for this aspect is via a standard 
RS170 analog video interface. 

In accord with one aspect, a system of the invention defines two imaging 
10 zones (either within a single camera CCD or within multiple CCD cameras housed 
within a single housing). One imaging zone covers the user's head; and the other 
covers the user's eyes. This aspect includes processing means to process both zones 
whereby movement of the user's head provides one mechanism to control cursor 
movement (or scene view motion), and whereby the user's eyes provide another 
1 5 mechanism to control the movement. In essence, this aspect increases the degrees of 
freedom in the control decision making of the system. By way of example, a user 
might look left or right within a game without moving his head; but by assessing 
movement of the user's eyes (or the pupils of those eyes), the scene view can be 
made to rotate or translate in the manner desired by the user. Further, a user might 
20 move his head for other reasons, and yet not move her eyes from a generally 
forward looking position; and this aspect can assess both movements (head and 
eyes) to select the most appropriate movement of the cursor or scene view, if any. 

In another aspect, a system of the invention utilizes a camera with zoom 
25 optics to define the user's pupil and to make cursor or scene views move according 
to the pupil. In another aspect, the system incorporates a neural net to "learn" about 
a user's eye movements so that more accurate movements are made, over time, in 
response to the user's eye movement. 

30 In still another aspect, a neural net is used to learn about other movements of 

the user to better specify cursor or scene view movement over time. 

7 ■ 
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In yet another aspect of the invention, a system is provided with two CCD 
arrays (either within a single camera body or within two cameras). The arrays 
connect with the user's computer by the techniques discussed herein. One CCD 
array is used to image the user's head; and the other is used to image the user's 
body. Motion of the user is then evaluated for both head and body movement; and 
cursor or scene view movement is adjusted based upon both inputs. 

In another aspect of the invention, a single CCD is used to image the user. 
However, alternate frames are zoomed, electronically, so that one frame views the 
user's head, and the next frame views the user's eyes. With the algorithm discussed 
herein, these separate frame sequences (one for the eyes, one for the head) are 
processed separately and evaluated, together, to make the most appropriate cursor 
or scene view movement. If for example, the system clocks at 30Hz, then one set of 
frame sequences operates at 15Hz, and the other at 15Hz. However, the advantage is 
that two movement information sets can be evaluated to invoke an appropriate 
movement in the cursor or scene view. 

Those skilled in the art should appreciate that different frame rates can be 
used; and frame rates for either sequence (head or eyes) can occur at different rates 
too. Further, the separate frame sequences can utilize other body parts, e.g., the head 
and the hand, to have two movement evaluations. Alternatively, a separate camera 
(or CCD array) can be used to image other body parts, for example one camera for 
the head and one for the hand. 

The invention also provides methods for shifting cursor or scene views in 
response to user movement. In one aspect, the scene view shifts left or right when 
the user shifts left and right. In another aspect, the scene view rotates when the 
user's head rotates. This last aspect can be modified so that such rotation occurs so 
long as the eyes do not also rotate (in this situation, the user's head rotates, 
indicating that she wishes the scene view to rotate; but the eyes do not, indicating 
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that she still watches the game in play). In another aspect, the scene view rotates in 
response to the user's hand rotation (i.e., a camera or at least a CCD array of the 
system is arranged to view the player's hand). 

5 In another aspect, the invention provides a multi-zone player gaming system 

whereby the user of a particular computer game can select which zone operates to 
move the cursor or the scene view. By way of example, the system can include one 
zone corresponding to a view of the user's head, where frames of data are captured 
by the system by a camera. Another zone optionally corresponds to the user's hand. 

10 Another zone optionally corresponds to the user's eyes. Each zone is covered by a 
camera, or by a CCD array coupled within the same housing, or by optical zoom 
zones within a single CCD, or by separate optical elements that image different 
portions of the CCD array. By way of example, two zones can be covered with a 
single CCD array (i.e., a camera) when the zones are the user's head and eyes. The 

15 camera images the head, for one zone, and images the eyes in another zone, since the 
zones are optically aligned (or nearly so). However, two cameras (or optionally two 
CCD arrays with separate optics) can view two zones such as the user's head and the 
user's hand. Combinations of zones is also possible and envisioned in accord with 
the invention. 

20 

Zones in a single camera can also be identified by the computer by 
prompting the user for motion from corresponding body parts. For instance, the 
computer identifies the head zone by prompting the user to move his head. Then 
the computer identifies the foot zone by having the user move his foot. Once the 
25 zones are identified, the motion of each of these individual zones is tracked by the 
computer and the regions of interest in the camera image related to the zones moved 
as the targets in the zones move with respect to the camera. 

In one aspect, the invention provides a system, including a camera and edge 
30 detection processing subsystem, which isolates edges of the user's body, for 
example, the side of the head. These edges are used to move the cursor or scene 

9 
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view. For example, if the left edge of head is imaged onto column X of one frame of 
the CCD within the camera, and yet the edge falls in column Y in the next frame, 
then a corresponding movement of the cursor or scene view is commanded by the 
system. For example, movement of the edge from one column to the next might 
correspond to ten screen pixels, or other magnification. In one aspect, this 
magnification is selected by the user. Up and down motion can also be detected by 
similar edge detection. For example, by imaging the user's chin, an edge movement 
in the up or down dimension is formed (e.g., if the bottom edge of the chin moves 
from one row to the next, in adjacent frames, then a corresponding movement of the 
cursor or scene view is made - magnification again preferably set manually with a 
default starting magnification). Other images can also serve to define edges. For 
example, in one aspect, a user's eyelash can be used to move the cursor (or scene 
view) up or downwards; though typically the eye blink is used to reset the cursor 
command cycle. 

In one aspect, an optical matched filter is used to center image zones onto the 
appropriate images. For example, as discussed above, one aspect preferably utilizes 
64x64 pixels as the image frame from which cursor motion is determined. Many 
cameras have, however, many more pixels. These 64x64 arrays are therefore 
preferably established through matched filtering. By way of example, an image of a 
standard pair of user's eyes is stored within memory (according to one aspect of the 
invention). This image field is cross-correlated with frames of data from the actual 
image from the camera to "center" the image at the desired point. With eyes, 
specifically, ideally the 64x64 sample array is centered so as to view both eyes within 
the 64x64 array. Similarly, to process sequences of head data, a standard head image 
is stored within memory, according to one aspect, and correlated with the actual 
image to center the head view. 

Those skilled in the art should appreciate that an appropriate frame size can 
be established from an image having more or fewer pixels, by redundantly allocating 
data into adjacent pixels or by eliminating intermediate pixels, or similar technique. 

10 
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In another aspect, a camera is provided which optically " zooms" to provide 
optimal imaging for a desired image zone. By way of example, the invention of one 
aspect takes an image of the user's head, determines the location of the user's eyes 
(such as by matched filtering), and optically zooms the image through movement of 
optics to provide an image of the eyes in the desired processing size format. 

Many aspects of the invention are preferably enhanced by autofocus. 
Specifically, it is often desirable to have a crisp image of the user (or a part of the 
user, e.g., the user's eyes) in order to accurately process desired cursor or scene view 
movement. Thus, autofocus capability preferably operates in most of the aspects of 
the invention where imaging is a feature of the processing. 

In one aspect, the camera utilizes a very small aperture which results in a very 
large depth of field. In such a situation, autofocus is not required or desired. The 
optical requirements for the lenses are also reduced. 

The invention thus provides several advantages over the art. For example, 
game controllers can now include feedback corresponding to the user's actual 
movement. By way of another example, if the user moves left or right (or head or 
hand or eyes move left or right, depending on the image zone), then the cursor (or 
scene view) can also be set to move left or right. When the user twists her head, for 
example, the scene view can also be made to rotate, reflecting that movement. 

Those skilled in the art should appreciate that the direction in which the scene 
moves, left or right, is a matter of design choice. That is, certain games might find it 
desirable to move the opposite direction from what the user moves, to add certain 
challenges to the game. Further, in other aspects, this direction can change during 
the game to further complicate game control. 

In accord with one aspect of the invention, a processing subsystem (connected 
with the camera) is used to make cursor movement (or scene view movement) 

ll 
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correspond to user's motion. This processing subsystem of another aspect further 
detects when the user twists his head, to add an additional dimension to the 
movement. 

5 In one aspect, a system of the invention includes an IR detector which is used 

to determine when a person sweats or heats up (by imaging, for example, part of the 
user's head onto the IR detector); and then the system adjusts game speed in a way 
corresponding to this movement. Alternatively, a heartbeat sensor is tied to the 
person to sense increased excitement during a game and the system speeds or slows 

10 the game in a similar manner. Note that a heartbeat sensor can be constructed, in one 
• aspect of the invention, by thermal imaging of the user's face, detecting blood flow 
oscillations, indicative of heartbeat. In other aspects, the heartbeat sensor is 
physically tied to the user, such as within the computer mouse or joystick. 

15 In one aspect, a computer of the invention adapts to user control as selected 

by a particular user. For example, in the case of a handicapped person, a particular 
user might select certain hand-movements, e.g., a single finger up, to move the 
cursor up; and another finger down to move the cursor left. An infinite combination 
of controls can be established; however this is one advantage of the invention in that 

20 users with many different disabilities can program cursor or scene view movement. 
In one aspect, a neural network is used to assist the processing system in establishing 
proper cursor movement. In another aspect, the computer for example learns to 
print something by movement of the user's finger (or other body part). 

25 In one aspect, tipping of the user's head (or other body part, or object) is used 

to provide another degree of freedom in moving the cursor or adjusting the scene 
view. By way of example, a tilt of the head, as imaged by the camera, can be set to 
command a rotation of the scene view. 

30 In still another aspect, a camera of the invention uses autozoom to move in 

and out of a given scene view. By way of example, the camera is first focussed on the 
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user's face in one frame; but in a subsequent frame the camera must focus to closer 
to compensate for the fact that the user moved closer to the camera (typically, the 
camera is on the monitor, so this also means that the user moved closer to the scene 
view). This autofzoom is used, in one aspect, to make the scene view appear as if the 
user is "creeping" into the scene. By moving the scene in and out, the user will 
perceive that he is moving in or out of the scene view. 

In another aspect, a camera images an object held by the user. Preferably, the 
object has a well-defined shape. The system images the object and determines 
difference information corresponding to movement of the object. By way of example, 
rotating the object upside down results in difference information that is upside 
down; and then the scene view inverts by operation of the system. In another 
example, twisting of the object rotates the scene view left or right, or rotates the 
scene in the direction of the twisting. 

In another aspect, two cameras image the user: one camera pointed at the 
front of the users face or hand and the other down at the top of the users head or 
hand. The front facing camera is used to detect rotational and linear translation in 
up-down and left-right directions. The top viewing camera determined front-back, 
left right translation. The front-back translation observed by the top camera is used 
to control forward and back motion in the users 3-D view. The top sensed left-right 
translation controls the users left right slide or strafe. The top sensed left-right 
motion is removed from the front view left-right translation with the remaining 
front view measure representative of left-right twist. All of the front view up-down 
translation can be interpreted as up-down twist. 

Brief Description of the Drawings 

Figure 1 illustrates one human computer interface system constructed 
according to the invention; 
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Figure 1A shows an exemplary computer display illustrating cursor 
movement made through the system of Figure V, 

Figure IB illustrates overlayed scene views, displayed in two moments of 
time on the display in Figure 1, of a shifting scene made in response to user 
movement captured by the camera of Figure 1; 

Figure 1C shows an illustrative frame of data taken by the system of Figure 1; 

Figure 2 illustrates selected functions for a printed circuit card used in the 
system of Figure 1; 

Figure 3 illustrates an algorithm block diagram that preferably operates with 
the system of Figure 1; 

Figure 4 illustrates one preferred algorithm process used in accord with the 
invention to determine and quantify body motion; 

Figure 5 shows one process of the invention for communicating body motion 
data to a host processor, in accord with the invention, for augmented control of 
cursor position or scene view; 

Figure 5A shows representative frame of data of a user taken by a camera of 
the invention, and further illustrates adding symbols to key body parts to facilitate 
processing; 

Figure 6 illustrates a two camera imaging system for implementing the 
teachings of the invention; 
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Figure 7 illustrates two positions of a user as captured by a camera of the 
invention; and Figure 7 A illustrates two positions of a scene view on a display as 
repositioned in response to movement of the user illustrated in Figure 7; 

5 Figure 8 illustrates motion of a user - and specifically twisting of the user's 

head - as captured by a system of the invention; Figure 8A illustrates a first scene 
view corresponding to a representative computer display before the twisting; Figure 
8B illustrates a second scene view corresponding to a rotation of the first scene view 
in response to the twisting by the user; Figure 8C shows processing features of the 

10 processing section of Figure 8; and Figure 8D illustrates multiple image frames 
stored in memory for matched filtering with raw images acquired by the system of 
Figure 8; 

Figure 9 illustrates a two camera system of the invention for collecting N 
15 zones of user movement and for repositioning the cursor or scene view as a function 
of the N movements; Figure 9A illustrates a representative thermal image captured 
by the system of Figure 9; and Figure 9B illustrates process methodology for 
processing thermal images as a real time input to game processing speed, in accord 
with the invention; 

20 

Figure 10 illustrates another two camera system of the invention for targeting 
multiple image movement zones on a user, and further illustrating optional DSP 
processing at the camera section; 

25 Figure 11 illustrates framing multiple movement zones with a single imaging 

array, in accord with the invention; 

Figure 12 illustrates framing a user's eyes in accord with the invention; and 
Figure 12A shows a representative image frame of a user's eyes; 

30 
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Figure 13 illustrates one system of the invention, including zoom, neural nets, 
and autofocus to facilitate image capture; 

Figures 14, 14A and 14B illustrate autofocus motion control in accord with 
the invention; 

Figures 15 and 15A illustrate one other motion detect system algorithm 

* 

utilizing edge detection, in accord with the invention; 

Figure 16 illustrates one other motion detect system algorithm utilizing well- 
characterized object maniputions , in accord with the invention; 

Figure 17 illustrates one other motion detect system algorithm utilizing varied 
body motions, in accord with the invention; 

Figure 18 illustrates a two camera system of the invention with a camera 
observing the user's face while the other observes the top of the user's head; 

Figure 19 shows a blink detect system of the invention; and 

Figure 20 shows a re-calibration system constructed according to the 
invention. • 

Detailed Description of the Drawings 

Figure 1 illustrates, in a top view, certain major components of a human 
computer interface system 10 of the invention, A user 12 of the system 10 sits facing 
a computer monitor 14 with display 14a. A camera 16 is mounted on the computer 
monitor 14 facing the user 12. In the illustrated embodiment, the camera 16 is 
mounted in such a way that the user's face 12a is imaged within the camera's field of 
view 16a. However, as discussed herein, the camera 16 can alternatively image other 
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locations, such as the user's hand, eyes, or on other objects; so imaging of the user's 
face, in Figure 1, must be considered illustrative, rather than limiting. Further, the 
camera location can also reside at places other than on top of the monitor 14. 

With further regard to Figure 1, the camera 16 interfaces with a printed circuit 
card 18 mounted within the user's computer chassis 20 (which connects with the 
monitor 14 by common cabling 20a). The camera 16 interfaces to the printed circuit 
card 18 via a camera interface cable 22. The circuit card 18 also has processing 
section 18a, such as a digital signal processing ("DSP") chip and software, to process 
images from the camera 16. 

In operation, the camera 16 and card 18 capture frames of image data 
corresponding to user movement 25. The processing section 18a algorithmically 
processes the image data to quantify that movement 25; and then communicates this 
information to the host processor 30 within the computer 20. The host processor 30 
then commands movement of the computer cursor in a corresponding movement 
25a, Figure 1A (Figure 1A illustrates a representative front view of the display 14a, 
and also illustrates movement 25a of the cursor 26 moving within the display 14a in 
response to user movement 25). 

Figure IB illustrates an alternative (or supplemental) process whereby the 
scene view shifts in response to user movement 25. Specifically, Figure IB illustrates 
a first scene view 35a, which generally corresponds to a forest prior to the user's 
movement 25; and an overlayed scene view 35b (shown in dotted line, for purposes 
of illustration) that is shifted by an amount 37 in response to the user's movement 25. 
The shift 37 in the scene view 35 is accomplished by combined operation and 
processing of the processing section 18a and host CPU 30. 

Figure 1C shows a representative frame 41 of data 43 as taken by the camera 
16. As illustrated, data 43 represents the user's face 12a taken at a given moment of 
time. Subsequent frames (not shown) are used to determine user motion 25 relative 
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to the frame 41, as discussed herein. The frame 41 is made up of the plurality of 
pixel data 45, as known in the art. 

» 

Figure 2 illustrates certain functions processed within the printed circuit 
5 board 18 of Figure 1. A camera interface circuit 50 receives video data from the 
camera 16 through interface cable 22. This video data can be RS170 format or digital, 
for example. For analog RS170 format, circuit 50 decodes the analog video data to 
determine video timing signals embedded in the analog data. These timing signals 
are used for control of the analog-to-digital (A/D) converter included in circuit 50 
10 that converts analog pixel data into digital images. In the preferred embodiment, the 
analog data is digitized into 6-bits, though any number of bits greater may be 
acceptable and/ or required for features as discussed herein. For digital data format, 
camera interface 50 accepts the digital data without additional quantization, 
although interface 50 can digitally pre-process the digital images if desired to 
1 5 acquire desired image features. 

The frame difference electronics 52 receives digital data from the camera 
interface circuit 50. The frame difference electronics 52 include a multiple frame 
memory, a subtraction circuit and a state machine controller/ memory addresser to 

20 control data flow. The frame memory holds previously digitized frame data. As 
each digitized pixel is received by the frame difference electronics 52, the 
corresponding pixel from a previous frame is read from the frame memory and 
subtracted from the current frame. The preferred implementation uses the frame 
just previous to the current frame, though an older frame which resides in the frame 

25 memory may be used. The resulting difference is output to an N-frame video 
memory 54. The new frame pixel data is then stored into the frame memory of the 
frame difference electronics 52. 



The N frame video memory electronics 54 either receives differenced frames 
30 output by the frame difference electronics 52 (discussed above) or raw digitized 
frames from the camera interface 50. The choice of where the data derives from is 
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made by software resident on the DSP 56. The frame video memory 54 is sized to 
hold more than one full frame of video and up to N number of frames. The number 
of frames N is to be driven by hardware and software design. 

5 In the preferred embodiment, the DSP 56 implements an algorithm discussed 

below. This algorithm determines the rate of head motion of the user in two 
dimensions. The digital signal processor 56 also detects the eye blink of the user in 
order to emulate the click and double click action of a standard mouse button. In 
support of these functions, the DSP 56 commands the N frame video memory 54 to 

10 supply either the differenced frames or the raw digitized frames. The digital signal 
processor thus preferably utilizes a supporting program memory 58 made up of 
electrically reprogrammable memory (EPROM) and data memory 59 including 
standard volatile random access memory (RAM). The DSP 56 also interfaces to the 
PCI bus interface electronics 60 through which cursor and button emulation is 

15 ' passed to the user's main processor (e.g., the CPU 30, Figure 1). The PCI interface 60 
also passes raw digitized video to the main processor as an optional feature. 
Interface 60 also permits reprogramming of program memory 58, to allow for future 
software upgrades permitting additional features and performance. 

20 The PCI interface electronics 60 thus provides an industry standard bus 

interface supporting the aforementioned communication path between the printed 
circuit card 18 and the user's main processor 30. 

With optional MPEG compression electronics 62, the printed circuit card 18 
25 and camera 16 can provide compressed video to the user's main processor 30. This 
compressed video supports using the system 10 in teleconferencing applications, 
providing dual use as either human computer interface system 10 and/ or as a 
teleconferencing system in an economical solution to two distinct applications. 

30 Figure 3 describes one preferred head motion block diagram algorithm 70 

used in accord with the invention. Not all of the functions shown in Figure 3 are 
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implemented in software in the DSP 56. Rather, this algorithm relies on the 
correlation of images from one frame to the next, and particularly relies on the use of 
frame differenced images in the correlation process. The frame differencing 
operation removes parts of the camera images that are unchanged from the previous 

5 frame. For example, room background (such as object 13, Figure 1) behind the user 
12 is removed from the image. This greatly simplifies detection of feature motion. 
Even the image of the user's face image consists of regions of uniform illumination 
such that even with the user's facial motion, these uniform regions (i.e. cheeks, 
forehead, chin) may also be removed. The user's face 12a also consists of typically 

10 dynamic features such as the nose, eyes, eyebrows and mouth, each of which 
typically has enough spatial detail that will be evident in the differenced image. As 
the user moves his face with respect to room lighting, the shape and distribution of 
these features will change; but the frame rate of the camera 16 ensures that these 
features look similar from one frame to the next. The correlation process therefore 

15 operates to determine how these differenced features are moving from one frame to 
the next in order to determine user head motion 25. 

The algorithm of block diagram 70, Figure 3, receives video images 72 of the 
user as imaged by camera 16 over time. Each received image is passed to both a 

20 frame memory 74 and a differencer 76. Though the preferred embodiment is to 
buffer a single frame in memory 74, the memory 74 may optionally store many 
frames, buffered such that the first frame input is the first frame output. The 
delayed frame is read from the frame memory 74 and subtracted from the current 
frame using the differencer 76. Frame output from the differencer 76 is provided to 

25 both a correlation process 78 and a difference frame memory 80, 

Like the frame memory 74, the preferred embodiment of frame memory 80 
utilizes a single difference frame; however the difference frame memory 80 can hold 
many difference frames in sequence for a finite time period. The delayed difference 
30 frame is read from the difference frame memory and provided to the correlation 
function 78. Difference frames are preferably selectable by system algorithms. The 
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correlation process 78 determines the best combination of row and column shifts in 
order to minimize the difference between the current difference frame and the 
delayed difference frame. The number of rows and columns required to align these 
difference images provides information as to the user's motion. 

The best-fit function algorithm 82 determines the row and column shift to 
provide optimal alignment. In the case of a classical correlation process, the best-fit 
function can consist of a peak detect algorithm. This algorithm may either be 
implemented in hardware or in software. 

The best-fit function algorithm determines relative motion in rows and 
columns of the observed user's features. The cursor update compute function 
algorithm 84 translates this measured motion into the position change required of 
the cursor (e.g., the cursor 26, Figure 1A). Typically, this is a non-linear process that, 
with greater head motion, the cursor moves a non-proportionally greater distance. 
For example a 1-pixel user motion can cause the cursor to move one screen pixel 
while a 10-pixel user motion may cause a 100-pixel screen cursor motion. However, 
these magnifications can be adjusted for desired result. This algorithm may either be 
implemented in hardware or in software such as through an ASIC or FPGA. 

Video cursor control 86 provides a user interface to enable and disable the 
operation of cursor control described above. This control is implemented, for 
example, through a combination of keystrokes on the user's keyboard (for example 
as connected to the host computer 20, Figure 1). Alternatively, cursor control is 
activated or deactivated by sensing the eye-blink of the user (or some other 
predetermined movement). In this alternative embodiment, an output signal 85 from 
the correlation section 78 is sent to the video enable section 86; and the output signal 
85 corresponds to blink data from the user's face 12a (Figure 1A). In another 
embodiment, the video cursor control section 86 activates or deactivates cursor 
control by recognizing voice commands. A microphone 87 detects the user's voice 
and a voice recognition section 89 converts the voice to certain activate or deactivate 
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signals. For example, the section 89 can be set to respond to "activate" as a voice 
command that will enable cursor control; and "deactivate" as a command that 
disables cursor control. 

5 The functionality of the video cursor control 86 provides the user with the 

equivalent of a mouse pick-up, put-down action. As the user moves the cursor from 
left to right across the screen, the user would de-activate motion-based cursor 
control in order to allow the user to move his head back to the left. Once the user 
has recentered his head, the user would once again activate the cursor control and 

10 continue to move the cursor about the screen. The activation/ deactivation of the 
mouse input is represented by the switch 90, such that the open position of the 
switch disables human motion control of the cursor and supplies a zero change input 
to the summation operation 92 in such conditions. 

15 Those skilled in the art should appreciate that control of scene view may also 

be implemented by an algorithm such as shown in Figure 3. Specifically, a similar 
algorithm can provide' movement of the current scene view, in accord with the 
invention. 

20 With video cursor control enabled, the result of the cursor update compute 

function 84 is added to the known current cursor position at the summing operation 
92. This summation has a x component and a y component. The result of the 
summation 92 is used to update the cursor position (or scene view) on the user's 
screen via the user's operating system. Cursor position may thus be controlled by 

25 both user motion as well as the motion imparted by another input device such as a 
standard computer mouse. 

Figure 4 provides a detailed description of the preferred implementation of 
the algorithm described in functions 73, 76, 78, 80 and 82 of Figure 3. Video data is 
30 received by the processing electronics in both a single frame memory 100 and a 
differencer 102. The output of the frame memory 100 is also provided to the 

22 



10/26/2006, EAST Version: 2.1.0.14 



WO 99/35633 PCT/US99/00086 

differencer 102 such that the previous frame is subtracted from the current frame. 
This differenced frame is than processed by a two dimensional FFT 104. The complex 
result of the FFT 104 is provided to a complex multiplier 106 and a complex memory 
108. The complex memory 108 is the size of the processed image, each location 

5 containing both a real and imaginary component of a complex number. With each 
new FFT operation 104, the previous FFT result, contained in the complex memory 
108, is provided to the conjugate operation 110. The complex conjugate of each 
element is computed and provided to the complex multiplier 106. In this manner, 
the FFT of the previous frame difference is conjugated and multiplied against the 

10 FFT of the current difference image. 

By way of comparison between Figures 3 and 4, item 76 has similar 
functionality to item 102; item 78 has similar functionality to items 104, 106, 108, 110, 
112; item 80 has similar functionality to item 108; and item 82 has similar 
15 functionality to item 114. 

The two dimensional array of complex products output by the complex 
multiplier 106 is provided to a two dimensional inverse FFT operation 112. This 
operation creates an image of the correlation function 114 between the latest pair of 
20 difference images. The correlation image is processed by the peak detection function 
114 in order to determine the shift required in aligning the two difference images. 
The x-y magnitude of this shift is representative of the user's motion. This x-y 
magnitude is provided to the software used to update the cursor position as 
described in Figure 3. 

25 

Figure 5 shows an algorithm process 130 of the invention and which applies 
motion correlation operations over sub-frames of the video image. This allows 
motions of various body parts to convey input with specialized meaning to 
applications operating on the host computer. In addition to head motion, motion of 
30 the hands, arms and legs provide for greater degrees of freedom for the user to 
interact with the host application (e.g., a game). Commands of this type are useful in 
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combative games where computer animated opponents fight under control of the 
user. In that instance, the hand, arm and leg motions of the user become punch, chop 
and kick commands to the computer after process 130. This command mode can 
also be used in situations where the user does not have ready access to a keyboard, 
to augment cursor control of the previously described head position correlator. 

Process 130 identifies the functions required to derive commands from 
general motions of the user's body. The scene analyzer function 132 receives 
digitized video frames from the camera (e.g., the camera 16 of Figure 1) and 
identifies sub-frames within the video for tracking various parts of the user's body. 
The frame difference function 134 and correlator function 136 provide similar 
functions as processes 74, 76 and 78 of Figure 3. The correlation analyzer 138 
receives correlated difference frames from the correlator function 136 and sub-frame 
definitions from the scene analyzer 132. The correlation analyzer 138 applies a peak 
detection function to each sub-frame to identify the shift required to achieve best 
alignment of the two images. Correlation peaks occurring in the center of the sub- 
frame indicate no motion, while peaks occurring elsewhere indicate the direction 
and magnitude of the user's motion. The motion interpreter 140 receives motion 
vectors for each sub-frame from the correlation analyzer 138. The motion interpreter 
140 links the motion vector from each sub-frame with a particular body segment and 
passes this information onto the host interface 142. The host interface 142 provides 
for communication with the host processor (e.g., CPU 30, Figure 1). It sends data 
packets to the host to identify detected body motions, their directions and their 
amplitudes. The host interface 142 also receives instruction from the host as to which 
. body segments to track which it then passed along to the motion interpreter 140 and 
the scene analyzer 132. 

The scene analyzer 132 first identifies the location of the user's body in the 
image and locates the position of various parts of the user's body such as hands, 
forearms, head, and legs. The techniques and methods used to identify the user's 
body location and body part positions can be accomplished using techniques well 
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known to those skilled in the art (by way of example, via matched filtering). Body 
identification can also be augmented by marking different locations on the user's 
body with unique visual symbols. Unique symbols are assigned to key body joints 
such as elbows, shoulders, hands, neck, knees, and waist and are mounted on the 
body. See for example Figure 5A. 

Figure 5A illustrates one frame 149 of data of an image of the user 150 as 
taken by a camera of the invention. The image corresponds to a full body image of 
the user 150, including arms 151, legs 152, elbows 151a, hands 153, head 154, neck 
155, ears 156, and forhead 157. These parts 151-157 are identified by processes of the 
invention (e.g., spatial location in the image, by matched filtering or other image 
recognition technique), and the image is preferably marked with unique symbols 
(e.g., "X" for center of the face, "Y" for center of the hand 153, "T" for center of the 
user's foot, "Z" for body center, and "F" for forehead 157): 

With further reference to Figure 5, process 130 locates various body parts and 
preferably marks them with symbols to fill in connecting logic (e.g., the left wrist 
and left elbow symbol identify the location of the left forearm). Once the user's body 
parts are located, sub-frames surrounding each of the body segments identified by 
the host processor are generated. A sub-frame is a generally regularly shaped region 
within the image that surrounds a particular body part. The sub-frames are sized to 
center the subject body part in the sub-frame and to provide enough room around 
the body part to accommodate typical body motions. One sub-frame 160 is shown in 
frame 149, Figure 5 A, surrounding the user's foot T. The scene analyzer 132 will 
generally not operate on each frame of video since continuously changing the sub- 
frames adds unnecessary complication to the correlation analyzer 138. Instead, the 
scene analyzer 132 runs as a background process updating the sub-frame locations 
periodically. 

Figure 4 provides a detailed description of one algorithm which can be used 
to implement processes 134-138 of Figure 5. 
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The invention of one embodiment can thus track the motion of the user's 
body using symbols attached to key joints. As an example, the position of the user's 
left lower arm can be determined by locating the unique symbol for the left hand 

"Yl" and left elbow Unique symbols thus allow the processor to rapidly locate 

each portion of the user's body in a video frame. To determine the motion of a 
particular part of the user's body, the algorithm (e.g., Figure 4) compares the 
position of the relevant body parts in consecutive frames and determines how they 
moved (for example, using geometry). Once motion is determined, it is then passed 
to the host CPU where the motion is acted on as appropriate for the particular 
application. 

Figure 6 illustrates a two camera system 200, constructed according to the 
invention. The cameras 202a, 202b are arranged to view separate parts of the user: 
camera 202a images the user's face 204; and camera 202b images the user's hand 206. 
The cameras 202 conveniently rest on top of the computer display 208 coupled to the 
host computer 210 by cabling 216. The cameras 202 couple to the signal processing 
card 212 residing within the computer 210 by cabling 213. As discussed herein, 
motion of the user's head 204 and/ or hand 206 are detected by the signal processing 
card 212, and difference information is communicated to the computer's CPU 210a 
via the computer bus 214. This difference information corresponds to composite 
movement of the head 204 and hand 206; and is used by the CPU 210a to command 
movement of display items on the display 208 (for example, the display items can 
include the cursor or scene view as shown on the display 208 to the user). 
Information shown on the display 208 is communicated from the computer 210 to 
the display 208 along standard cable 216. 

Figures 7 and 7 A illustrate how motion of a user's head is for example 
translated to motion of the cursor and/ or scene view, in accord with the invention. 
Figure 7 shows a representative image 220 of a user captured within a frame of data 
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by a camera of the invention. Figure 7 also shows a representative image 222 (in 
dotted outline, for clarity of illustration) of the user in a subsequent frame of data, 
indicating that the user moved "M" inches. Figure 7 A illustrates corresponding 
scene views on a computer display 224 that is coupled to processing algorithms of 
the invention (i.e., within a system that includes a camera that captures the images 
220, 222 of Figure 7). The display 224 illustratively shows a scene view that includes 
a road 224a that extends off into the distance, and a house 224b adjacent to the road 
224a. A computer cursor 224c is also illustratively shown on the display 224 as such 
a cursor is common even within computer games, providing a place for the user to 
select items (such as the road or house 224a, 224b) within the display 224. The 
display 224 also shows, with dotted outlines 226, the scene view of road and house 
which are shown on the display 224 after motion by the user from 220 to 222, Figure 
7 (the cursor 224c is for example repositioned to position 224c'). The repositioning of 
the scene view from 224a, 224b to 226 occurs immediately (typically less than 1/30 
second, depending upon the camera) after the movement of the user of Figure 7 
from 220 to 222. The scene view is repositioned by x-pixels on the display 224, so 
that M/x corresponds to the magnification between user movement and scene view 
repositioning. This magnification can be set by parameters within the system; and 
can also be set by the user, if desired, at the computer keyboard. Furthermore, the 
rate at which the scene view moves the distance of x-pixels preferably occurs at the 
same rate as the rate of travel along distance M. Alternatively, the magnification can 
be dependent on the rate of motion such that a larger displacement of x-pixels will 
occur for a given motion M if the rate of change of M is larger. 

Figure 8 illustrates a further motion that can be captured by a camera of the 
invention and processed to reposition a scene view, as shown in Figures 8A and 8B. 
More particularly, Figure 8 illustrates a camera 250 connected to a processing section 
252 which converts user motion 254 to corresponding repositioning of the computer 
scene view. As above, the user 256 is captured by the camera's field of view 258 and 
frames of data are captured by the processing section 256. In Figure 8, motion 254 
corresponds to a twisting of the user's head 256; and processing section 252 detects 
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this twisting and provides repositioning information to the host computer (not 
shown). Processing section 252 can also incorporate head-translation motion (e.g., 
illustrated in Figure 15A) into the scene view movement above; and can similarly 
reject translational movement too, if desired, so that no scene motion is observed for 
5 translation of the user 256. 

Figure 8A shows a representative scene view 260 on a display 262 coupled to 
the host computer. Figure 8B illustrates repositioning of the scene view 260' after the 
processing section 252 detects motion 254 and updates the host computer with 
10 difference information (e.g., that information which the host computer uses to rotate 
or translate the scene view). 

Figure 8A also illustrates the intent of the rotating scene view feature. In 
Figure 8A, a person 260a is shown in the scene view 260, except that the person 260a 
15 is almost completely obscured by the edge 262a of the display 262. By twisting the 
head 256 in motion 254, the scene view 262 is rotated in the corresponding direction 
- as shown by scene view 260' in Figure 8B ~ so that the user 260a' is completely 
visible within the scene view 260'. 

20 Figure 8C illustrates further detail of the processing section 252. Camera data 

such as frames of images of a user are input to the section 252 at data port 266. The 
data are conditioned in the image conditioning section 268 (for example, to reduce 
correlated noise or other image artifacts). Thereafter, the camera data is compared 
and correlated in the image correlation section 270, which compares the present 

25 frame image with a series of stored images from the image memory 272. In the 
preferred embodiment, the present data image frame 249 is cross-correlated with 
each of the images within the image memory 272 to find a match. These images 
correspond to a series of images of the user in known positions, as illustrated in 
Figure 8D. 

30 
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In Figure 8D, various images are stored representing various known positions 
of relevant part, here the user's head 256. In the position of Figure 8, for example a 
straight on face shot, the 0° stored memory image would provide the greatest cross- 
correlation value indicating a matched image position. Accordingly, the scene view 

5 would adjust to a zero position. If, however, the image correlated to a -90° position, 
the scene would rotate to such a position. Other movements cause additional scene 
view motions, including tilt and tip of the head, as shown in the two images "0°, 
Down 45 Q// image and the "0°, Up 45°". These images cause the scene view to move 
upwards or to tilt up or down, when the process section 252 correlates the current 

10 frame to these images. As indicated, these images have no left or right component, 
though other images (not shown) can certainly include left, right and tip motion 
simultaneously. 

Figure 9 shows a system 300 constructed according to the invention and 
1 5 including a camera section 302 including an IR imager 304 and a camera 306, both of 
which view and capture frames of data from a user 308. The IR imager 304 can 
include, for example, a microbolometer array (i.e., "uncooled" detectors known in 
the art) which produces a frame of data corresponding to the infrared energy 
emitted from the user, such as illustrated in Figure 9A. Figure 9A shows a 
20 representative frame of IR image data 310, with zones 312 of relatively hot image 
data emitting from regions of forehead, nose and mouth of the user 308. 

The cameras 304, 306 send image data back to the signal processing section 
314. Data from the camera 306 is processed, if desired, as above, to determine 

25 difference information signal 322 used by a connected computer to reposition the 
cursor and/ or scene view. Data from camera 304, on the other hand, is used to 
evaluate how much (or how hot) zones 312 appear on the user during play of the 
computer. The signal processing section 314 assesses the zones 312 for temperature 
and/ or size over the course of a computer game and generate a "game speed 

30 control" signal 320 which is communicated to the user's computer (i.e., that 
computer used in conjunction with the system 300 of Figure 9). The user's computer 
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processes the signal 320 to increase or decrease the speed of a computer game in 
process on the computer. 

Those skilled in the art should appreciate that the IR camera 304 can be used 
without the features of the invention which assess user movement. Rather, this 
aspect should be considered stand-alone, if desired, to provide active feedback into 
gaming speed based upon user temperature and/ or stress. Note that the camera 304 
can also be used to detect heartbeat since the zones 312 generally pulse at the user's 
heartbeat, so that heartbeat rate can also be considered as a parameter used in the 
generation of the game speed control signal 320. Alternatively, a pulse rate can be 
determined by known pulse rate systems that are physically connected to the user 
308. 

An IR lamp 324 can be used in system 300 to illuminate the user 308, with IR 
radiation 324a, such that sufficient IR illumination reflects off of the user 308 
whereby motion control of the cursor and/ or scene view can be made without the 
additional camera 306. The lamp 324 can be, and preferably is, made integrally with 
the section 302 to facilitate production packaging. 

An IR lamp 324 operating in the near-IR can also be used with visible cameras 
of the invention which typically respond to near-IR wavelengths. By way of 
example, certain camera systems now available incorporate six IR emitters around 
the lens to illuminate the object without distraction to the user who cannot see the 
near-IR emitted light. Such a camera is suitable for use with the invention. 

Figure 9B shows process methodology of the invention to process thermal 
user images in accord with the preferred embodiment of the invention. Specifically, a 
system such as system 300 first acquires a thermal image map in process block 326. 
This image is compared to a reference image ("REF") in process block 327. REF can 
either be a temperature of the user (i.e., a temperature of one hot spot of a non- 
stressed user, or the temperature of one hot spot of the user at an initial, pre-game 
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condition) or an amount of the area 312, Figure 9 A, of the user in a non-stressed 
condition or initial pre-game condition). By way of example, REF can be an image 
such as the frame 310 of Figure 9A. When the temperature or area of the region 312 
increases, the system 300 detects this change and determines that the image map 
5 exceeded the REF condition, as illustrated in process block 328. Should the map 
exceed the REF condition, the system 300 communicates this to the host processor 
which in turn adjusts the gaming speed, as desired. If the map does not exceed the 
REF condition, then the next IR image frame is acquired at block 326. 



10 System 300 and the process steps of Figure 9B are thus suitable to adjust 

gaming speed in real time, depending upon user stress level. In the preferred 
embodiment, the gaming speed is increased automatically such that the image map 
exceeds the REF signal for greater than about 50% of the time, so that all users, 
regardless of their ability, are pushed in the particular game. 

15 

Those skilled in the art should appreciate that multi-camera embodiments of 
the invention can and preferably are incorporated into a common housing 338, such 
as shown in Figure 10. Further, as illustrated in Figure 10, cameras can also be made 
from detector arrays 340, processing electronics 342, and optics 344. Each camera 

20 340, 342, 344 is constructed to process the correct electromagnetic spectrum, e.g., IR 
(using, for example, germanium lenses 344 and microbolometer detectors 340). Each 
camera has its own field of view 350a, 350b and focal distance 352a, 352b to image at 
least a part of the user. These field of views 350 can overlap, to view the same area 
such as the user's face, or they can view separate locations, such as the user's head 

25 and hand. 



Cameras of the invention can also include a DSP section 356 such as described 
above to process user motion data. The DSP section 356 processes user motion data 
and sends difference information to the user's host computer. The host computer 
30 thereafter repositions the cursor and/ or scene view based upon the difference 
information so that the user observes corresponding motion on the computer 
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display, as described above. Accordingly, the DSP section need not reside within the 
computer so long as difference information is isolated and communicated to the host 

computer CPU. 

« 

Those skilled in the art should appreciate that algorithms of the invention can 
also be processed directly by the computer's CPU, provided it has sufficient power 
and processing speed, to eliminate a separate DSP chip or section. DSP or equivalent 
processing capability can also provided within the computer's chassis by way of a 
computer printed circuit card installed in the chassis and connected with the camera. 
The location and amount of processing power, therefore, should be considered a 
matter of design choice and current state of the art, each technique being within the 
scope of the invention. 

Figure 11 illustrates frame capture by one camera of the invention to isolate 
zones of imaging according to expected motion patterns. In Figure 11, one frame 370 
of data for example covers the user's eyes 371, corresponding to one image zone; and 
another frame 372 of data can cover the user's head 373, corresponding to another 
image zone. As mentioned previously, preferably the frames 370, 372 are 64x64 
pixels each, or 256x256 (or higher powers of two) to provide FFT capability on the 
image within the frame. A single camera can however provide both frames 370 and 
372, in accord with the invention. Specifically, a dense CCD detector array (e.g., 
480x740 pixels, 1000x1000 pixels, or higher) is used within the camera such that the 
whole array captures an image frame 376 of data, at least covering the available 
image format of the computer display 378. A matched filtering (or other image 
locate process) is processed on the frame 376 to locate the center 371a of the user's 
eyes (in the matched filtering process, an image data set of the user's eyes is stored in 
memory and correlated to the frame 376 such that a peak correlation is found at 
position 371a). Thereafter, a 64x64 array of data is centered about the eyes 371 to set 
the frame 370. To acquire the frame 372, every other pixel is discarded so that, again, 
a 64x64 array is set for the frame 372 (alternatively, each adjacent pair of pixels is 
added and averaged to provide a single number, again reducing the total number of 

32 

10/26/2006, EAST Version: 2.1.0.14 



WO 99/35633 



PCT/US99/00086 



pixels, to 64x64). Note that this process is reasonable since the width of the eyes is at 
..least % the width of the user's face. Nevertheless, further compression can be 
obtained by utilizing every third pixel (or averaging three adjacent pixels) to obtain 
a larger image area in the frame 372. Note that the compression in the width and 
5 length dimensions need not be the same. 

Framing of the information in Figure 11 can occur in several ways. Most 
cameras image at 30Hz so that image motion is smooth to the human eye. In one 
embodiment, one frame 370 is taken in between each frame 372, to minimize data 
10 throughput and processing; and yet to maintain dual processing of the two zones 
imaged in Figure 11. Alternatively, both frames 370, 372 are processed concurrently 
since frame 376 is typically the 30Hz frame. 

Figure 11 also illustrates how framing can occur around the user's eyes 371 to 
15 acquire "blink" information to reset cursor control. A blink detected by the user's 
eyes in frame 370 (or other frame) can be used to (a) disable or enable control of 
cursor or scene movement based upon user control, or (b) simulate pick-up and 
replacement of the computer mouse (i.e., reinitializing movement in a particular 
direction). For example, by detecting a blink of the eyes 371, a system of the 
20 invention can disable human motion following control such as described herein. 
Another blink can be used to enable human motion following control. Blinking can 
also be used to continue motion in a particular direction. For example, movement of 
the cursor can be made to follow movement of the user's head, as described above. 
However, after a while, the person has to move to an uncomfortable position to keep 
25 moving the cursor (or scene). A blink can thus also serve to reposition the head back 
to a normal starting position so that further movement in the desired direction can 
be made. 

» 

Figure 12 illustrates a similar capture of a user's eyes 400, in accord with the 
30 invention. A frame 402 can thus be acquired by a camera of the invention. Figure 
12A illustrates further detail of one representative frame 402, illustrating that the 
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user's pupils 404 are also captured. Figures 3 and 4 describe certain algorithms of the 
invention that are also applicable to motion of the user's pupils 404, as illustrated by 
left and right motion 406 and up and down motion 408. Accordingly, by zooming in 
on the user's eyes, another movement zone is created that causes repositioning of the 
5 cursor or the scene view based upon the movements 406, 408, much like the head 
movement described and illustrated in Figures 1-4. 

Note that the teachings of Figures 1-4 and 12-12A can be combined within a 
two zone movement system so that, for example, both head motion and pupil 

10 motion can be evaluated for image motion. The cursor and/ or scene view can be 
repositioned, therefore, based upon movements from both zones. By way of 
example, repositioning of items within the display (e.g., the cursor and/or scene 
view) can be made when the head moves but not if the head and eyes move, which 
might indicate that the user is simply looking elsewhere in the room due to a 

15 distraction. However, if the user moves his head, but not his eyes, he is focussed on 
the game and intends rotation of the scene view, in another example. Other 
combinations are also possible. 

Cameras of the invention can also include zoom optics which (a) reduce or 
20 enlarge the image frame captured by a particular camera, or which (b) provide 
autofocus capability. Figure 13 shows one system 430 constructed according to the 
invention. A camera 432 includes camera electronics 432a and a zoom attachment 
432b. Data from the camera 432 is relayed to image and interpretation feedback 
electronics 434 for evaluation. For purposes of image magnification, the feedback 
25 electronics serve to evaluate a given image size relative to desired image goals. For 
example, to image the user's eyes with high fidelity might require high density of 
pixels at the user's eyes (e.g., at the zone 370, Figure 11). Accordingly, the system 430 
can isolate the user's eyes, such as described herein, and command the camera 
(through command lines 436) to increase or decrease magnification on the user's 
30 eyes so as to achieve desired resolution. The feedback electronics can also command 
motion of the camera to change its boresight alignment (i.e., to change where the 
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camera image is centered) by commanding movement of the camera when resting on 
one or_more linear drives 438, as known in the art 

Once aligned on the desired user location, e.g., on the eyes with desired 
5 accuracy, the system 430 continues processing data such as described herein to create 
human interface control of items displayed on the user's computer, e.g., cursor 
and/ or scene view. Accordingly, processing section 440 operates to detect user 
motion and to communicate difference information to the user's computer, as 
described above. 

10 . 

With regard to autofocus, the system 430 of Figure 13 can also be used to 
process user motion based upon motion towards and away from the camera. Figure 
14 illustrates such a system, including a camera 450 with autofocus capability to find 
the best focus 452 relative to a user 454 within the field of view 456. For example, 

15 when the user 454 moves to position 460 (the user being shown in outline form 
454a), the new best focus has changed to 452a. The camera 450 provides a signal 450a 
to the image interpretation and feedback electronics 434, Figure 13, which indicates 
where the user is along the "z" axis from the camera 450 to the user 454. This signal 
450a is thus used much like the other motion signals described herein, to move the 

20 cursor and/ or scene view in response to such movements. Figure 14 A illustrates a 
representative scene view 462 when, for example, the user is at best focus 452. The 
scene view 462 includes a house image 464 with a door 465. When the user moves to 
position 460, the house and door 464', 465' of the scene view 462' enlarge, since the 
user moved closer to the camera 450. Such a motion might reveal, for example, 

♦ 

25 additional objects within the house, such as illustrated by object 466, Figure 14B. 
Accordingly, the autofocus feature of the invention provides yet another degree of 
freedom in motion control, in accord with the invention. 

Image data, manipulation, and human interface control can be improved, 
30 over time, by using neural net algorithms. As shown in Figure 13, a neural net 
update section 435 can for example couple to the feedback electronics 434 so as to 
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assimilate movement information and to improve data transmitted to the host 
computer, over time. Use of neural nets are known in the art. 

Figure 15 illustrates a frame of data 490 used in accord with the invention to 

5 implement a simplified left, right, up, down movement algorithm to control cursor 
movement and/ or scene view movement. Frame 490 is captured by a camera of the 
invention; and preferably the camera incorporates autofocus, as described above, to 
provide a crisp image of the user 492 regardless of her position within the camera's 
field of view. As shown, image frame 490 provides very sharp edges to the user's 

10 face, including a left edge 494a, right edge 494b, and chin 494c. These edges need 
only approximate vertical or horizontal position. Movement of the user results in 
movement of the edges 494, such as shown in Figure 15A. Figure 15A shows that 
once such edges are acquired, they conveniently permit subsequent movement 
analysis and control of scene view and/ or cursor position. Specifically, Figure 15 A 

15 shows movement of the user's "edges" from 494a-c to 494a-c', indicating that the 
user moved left (as viewed from the camera's position) and that her chin raised 
slightly, indicating that an upward tilt of the head. This information is assessed by 
the process sections as discussed above and relayed to the host computer as 
difference information to augment or provide cursor and/ or scene movement in 

20 response to the user's movement. 

Note that such edge movements roughly correspond to movement along 
rows and columns of the detector array. Detected movement from one row to 
another (or one column to another) can readily calculate the actual motion of the 
25 user from information of the user's best focus position and from the focal length of 

* 

the camera's lens. This information may then be used to set the magnification of 
movement of items in the computer display (e.g., cursor and/ or scene view). 

Figure 16 illustrates an image of one object 500 used in accord with the 
30 invention to provide image manipulation in response to motion of the object. The 
object 500 is held by the user 501 to manipulate motion of his cursor 502 and/ or 
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scene view 504 on his computer display 506. The object 500 is used because it 
exhibits an optical shape that is easily recognized through image correlation (such as 
matched filtering). In accord with the invention, a camera 510 is used to image the 
object 500; and frames of data are sent to the frame processor 512. The processor 512 

■ 

determines image position - relative to a starting position - and thereafter 
communicates difference information to the user's computer 505 along data line 514. 
The difference information is used by the computer's CPU and operating system to 
reposition items on the display 506 in response to motion of the object 500. Almost 
any motion, including rotation, tilting and translation are accomplished with the 
object 500 relative to a start position. This start position can be triggered by the user 
501 at the start of a game by commanding that the camera 510 take a reference frame 
("REF") that is stored in memory 513. The user 501 commands that REF imagery be 
taken and stored through the keyboard 505a, connected to the computer 505, which 
in turn commands the processor 512 and camera 510 to take the reference frame REF. 

Motion of the object 500 is thus made possible with enhanced accuracy by 
comparing subsequent frames of the object 500 with REF. When motion of rotation, 
tilt or translation are detected (for example, by using the techniques of Figures 2-4, 8- 
8D), then repositioning of items (502, 504) on the display 506 are follow that 
movement. 

The techniques of the invention permit control of the scene view and/ or 
cursor on a computer screen by motion of one or more parts of the user's body. 
Accordingly, as shown in Figure 17, complete motion of the user 598 can be 
replicated, in the invention, by correlated motion of an action figure 599 within a 
game. In Figure 17, user 598 is imaged by a camera 602 of the invention; and frames 
from the camera 602 are processed by process section 604, such as described herein. 
The user 598 is captured and processed, in digital imagery, and annotated with 
appropriate user segments, e.g., segments 1-6 indicating the user's hands, feet, head 
and main body. Motion of the segments 1-6 are communicated to the host computer 
606 from the process section 604. The computer's operating system then updates the 
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associated display 608 so that the action figure 599 (corresponding to an action figure 
within a computer game) moves like user 598. Accordingly, user motion of action 
figure 599 is made by the user 598 by performing stunts (e.g., striking and kicking) 
that he would like the action figure 599 to perform, such as to knock out an 
opponent within the display 608. 

In an alternative embodiment, icons can be used to simplify image and 
motion recognition of user segments such as segments 1-6. For example, if user 598 is 
marked with a star-shaped object on her hand (e.g., segment 1), then that star symbol 
is more easily recognized by algorithms such as described herein to determine 
motion. By way of another example, the hand of user 598 can be covered with a 
glove that has an "+" symbol on the glove. That "+" symbol can be used to more 
easily interpret user motion as compared to, for example, actually interpreting 
motion of the user's hand, which is rounded with five fingers. In a third example, 
user 598 can wear a article of clothing such as shirt 598a with a "+" symbol 598b; and 
the invention can be used to track the icon "+" 598b with great precision since it is a 
relatively easy object to track as compared to actual body parts. It should be 
apparent to those in the art that icons such as symbol 598b can be painted or pasted 
onto the individual too to obtain similar results. 

Figure 18 illustrates a two camera system 700 used to determine translation 
and rotation. The forward viewing camera 702 observes the user's face 703 and 
determines the right-left (Axi) and up-down (Ayi) translation of the user's face 703. 
The top viewing camera 704 observes the top of the user's face or head 705 and 
determines the right-left (AX2) and forward-backward motion (Ay 2) of the user's face 
or head. The two cameras 702, 704 are each processed through motion sensing 
algorithms 706 using the teachings above, and results are shown on the computer 
display 710. For purposes of illustration, the display 710 shows an image of the user; 
while the image can be, for example, an action figure or other computer object 
(including the computer cursor), as desired, which follows tracking motions Axi, Ayi, 
AX2, Ay2. As indicated in Figure 18, for example, Ay2 can be directly applied to 
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motion control of the user's forward and reverse motion (note, these motions are 
illustrated as within a computer display 710 as processed by algorithms 706). Axi 
can be directly applied to the users left-right sideways or strafe motion; Ayi can be 
directly applied to control the users up-down viewpoint, each as illustrated on 
display 710a. The results of the difference between Ax2 and Axi can be applied to 
control the user's left-right turn or viewpoint. 

The techniques of Figure 18 can be further extended to front, side and top 
view cameras for complete motion detection. The top camera determines the user's 
left-right, front-back motion while the front facing camera determines the user's 
rotational up-down, left-right motion. 

Figure 19 describes an algorithm to detect user eye blink. The video imagery 
is stored into a multiple frame buffer 800. The algorithm selects the current frame 
and a frame from the frame buffer and differences these frames using the adder 802. 
The difference frame consists of the pixel by pixel difference of the delayed frame 
and the current frame. The difference frame includes motion information used by 
the algorithms of teachings above. It also contains information on the user eye blink. 
The frames differenced by the adder 802 are separated temporally enough to ensure 
that one frame contains an image of the users face with the eyes open, the other 
image is of the user's face with the eyes closed. The difference image contains a two 
strong features, one for each eye. These features are spatially separated by the 
distance between the user's eyes. The blink detect function 808 inspects the image 
for this pair of strong features which are aligned horizontally and spaced within an 
expected distance based on the variation from one human face to another and the 
variation in seating distance expected from user to user. The recognition of the blink 
features may be accomplished using a matched filter or by recognition of expected 
frequency peaks in the frequency domain at the expected spatial frequency for 
human eye separation. The blink detect function 708 identifies the occurrence of a 
blink to a controlling function to either disable the cursor motion or take some other 
action. 
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Figure 20 illustrates a sound re-recalibration system 800 constructed 
according to the invention. As above, a camera 802 is arranged to view a user, a part 
of a user (e.g., a hand), or an object through the camera's field of view 804. A 

5 processing section 806 correlates the framing image data from camera 802 to induce 
movement of a scene view or cursor on the user's display 810. For purposes of 
illustration, the scene view or cursor is shown illustratively as a dot 808 on display 
810; and movement 812 of the cursor 808 from position 808a to 808b represents a 
typical movement of the cursor or scene view 808 in response to movement within 

10 the field of view 804, as described above. A re-calibration section 816 is used to reset 
the cursor or scene view 808 back to an initial position 808a, if desired. Specifically, 
in one embodiment, section 816 is a microphone that responds to sound 818 
generated from a sound event 818a, such as a snap of the user's fingers, or a 
particular word uttered by the user, to generate a signal for processing section 806 

15 along signal line 816a; and section 806 processes the signal to move the cursor or 
scene view 808 back to original position 808a. In another embodiment, re-calibration 
section 816 can also correspond to a processing section within the processing 
hardware/ software of system 800 to, for example, respond to the blink of a user's 
eyes to cause movement of the cursor 808 back to position 808a. 

20 

The invention thus attains the objects set forth above, among those apparent 
from the preceding description. Since certain changes may be made in the above 
methods and systems without departing from the scope of the invention, it is 
intended that all matter contained in the above description or shown in the 
25 accompanying drawing be interpreted as illustrative and not in a limiting sense. By 
way of example, although FFT correlation is often discussed in the above 
description, it should be apparent to those skilled in the art that other correlation 
techniques can be used with the invention to achieve similar results without 
departing from the scope hereof. 

30 

The following Matlab source code provides non-limiting computer code 
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suitable for use to control the cursor on a display such as described herein. The 
Matlab source code thus provides an operational demonstration of the concepts 
described and claimed herein. The Matlab source code is platform independent and 
needs only a sequence of input images. It includes a centroid operation on the 

5 correlation peak which is not included on the PC version (described below), 
providing a finer measurement on the motion in the image. More particularly, the 
centroid operation provides a refinement on locating the correlation peak. The PC 
code, discussed below, uses the pixel location nearest the correlation peak while the 
centroiding operation improves the resolution of the peak location to levels below a 

10 pixel. 



% Copyright (C) 1998 

% Video Mouse Group Partnership 

% 

15 % This following script file reads in a sequence of images of a computer user's face. 
% I t then processes the image sequence using the methods of difference 
% frame correlation processing used for a human-computer interace. 
% This code includes a centroiding operation and demonstrates the 
% difference frame correlation approach. 

20 

[filenameJilepathl^igetfileCdAvideomouse^.mat'); 

files=dir([filepath '".mat 1 ]); 

previousFrame=zeros(64); 

inputMatrix 1 =zeros(64); 
25 inputMatrix2=zeros(64); 

cursorPosX(l)=0; 

cursorPosY(l)=0; 

for i=l:size(files,l); 
load([filepath ■frame' num2str(i)]); 
30 camera=conv2(double(camera'),ones(2)/4 5 , same'); 
camera=camera(l :2:480,1 :2:640); 

camera=round(camera( 1 20-32: 1 20+3 1 , 1 60-32: 1 60+3 1 )/4); 
if mod(i,2) 

%Compute difference frame 
3 5 inputMatrix2=camera-previousFrame; 

%Save current difference frame for next iteration 

previousFrame=camera; 

%FFT difference frame 

inputMatrix2=fft2(inputMatrix2); 
40 %Perform difference frame correlation by multiplying difference frame by 

%complex conjugate of previous frame 
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correlationMatrix=0.00001 *conj(inputMatrix2.*conj(inputMatrixl )); 
else 

%Compute difference frame 
inputMatrix 1 =camera-previousFrame; 
5 %Save current difference frame for next iteration 
previousFrame=camera; 
%FFT difference frame 
inputMatrix 1 =fft2(inputMatrix 1 ); 

%Perform difference frame correlation by multiplying difference frame by 
10 % complex conjugate of previous frame 

correlationMatrix=0. 00001 *conj(inputMatrixl.*conj(inputMatrix2)); 
end 

%Compute inverse FFT on correlation matrix 
con*elationMatrix==real(fft2(correlationMatrix)); 
15 %Shift correlation matrix to center correlation peak 
temp=fftshift(correlationMatrix); 
%Find maximum value of correlation matrix 
correlationPeak=max(correlationMatrix(:)); 

%Perform centroiding on correlation peak in order to find peak location 
20 %The centroiding operation is not currently incorporated into the application version 
mask~temp>.50*correlationPeak(l); 
maskSize(i)=sum(mask(:)); 
ifmaskSize(i)<100 
col Sum s=sum((temp- . 5 0* correlationPeak( 1 )) . * mask) ; 
25 xPos(i)=sum(colSums.*(-32:3 1 ))/sum(colSums); 

rowSums=sum((temp , -.50*correlationPeak(l)).*mask t ); 
yPos(i)=sum(rowSums.*(-32:31))/sum(rowSums); 
else 
xPos(i)=0; 
30 yPos(i)=0; 
end 
ifi>l 

cursorPosX(i)=cursorPosX(i-l)+xPos(i); 
cursorPos Y(i)=cursorPosY(i- 1 )+yPos(i); 
35 end 
end 
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The following PC source code, labeled videoMouseDlg.doc and 
videoMouseDSP.doc, provide non-limiting and nearly operable DSP code for control 
of the cursor, as described herein. The code is not smooth; and there are other files 
required to compile this code to an executable, as will be apparent to those skilled in 
the art, including header files (*.h), resource files and compiler directives. 
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40 



* 
* 



Copyright (C) 1998 

Video Mouse Group Partnership 



* Module : dspcode.c 



/**; 

/* 
*/ 



MODIFICATION HISTORY 



/ 



i** — 
/* 

* Include files 
*/ 



MODULE ENVIRONMENT; 



:** 



#include 
#include 
#include 
#include 
#include 
#include 



"dddefs.h" 

"ddptypes.h" 

"dberrors.h" 

"xpgreg.h" 

"grabisr.h" 

"intpt40.h" 



/* XPG definitions and prototypes. */ 
/* XPG function prototypes. */ 
/* XPG error codes. */ 
/* XPG register definitions */ 
/* grab isr include file */ 

. /* Parallel Runtime Support Library intertuupts header */ 



^include "protocol.h" /* Protocol constants defined for this application */ 



I** - 

i* 

* Function prototypes. 
*/ 



/ 



extern VOID DBU_Appl Interrupt (VOID); 
extern VOID DDK_PKTDelay (ULONG); 



/** 
I* 

* Global Variables 
*/ 
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15 



20 



volatile LONG 



GJApplUserMasklntCount = 0; 



/** — 
/*== 

* Name 



CODE 



:DBU Correlation 



* Parameters : 

* llnputlmagel Image number of the 1st input image. 

* Hnputlmage2 Image number of the 2nd input image. 
* 

* Returns Error status 

* P_SUCCESS 

* IMotionX 

* IMotionY 

* lFrameRate 
* 

*— — — ===== — 



/ 



=*/ 



25 



30 



35 



/* FFT STUFF FROM TI */ 

#defme FFTSIZE 64 

#define LOGFFTSIZE 6 
#defme FFTSIZEx2 128 

#define BLOCK0 0x002ff800 



/* FFT size (n) 
/* log(FFT size) 



extern void cfficQ; 



/* C-callable complex FFT */ 



*/ 
*/ 



/* on-chip RAM buffer */ 



*/ 
*/ 



float complexMatrixl [FFTSIZE] [FFTSIZEx2]; /* Input matrix 
float complexMatrix2[FFTSIZE][FFTSIZEx2]; /* Input matrix 
float correlationMatrix[FFTSIZE] [FFTSIZEx2] ; 
long previousFrame[FFTSIZE] [FFTS IZE] ; 
float *pJocalRam,*fPtrl,*fPtr2; 
float correlationPeak; 

float *block0 = (float *)BLOCK0, *mml [FFTSIZE], *mm2 [FFTSIZE], *mm3 [FFTSIZE]; 



40 



45 



/♦End FFT Stuff 5 / 

LONG DBUJJserFunction (LONG llnputlmagel, LONG Hnputlmage2) 

{ 

PROCESSINGINFO Imagelnfol; 
PROCESSINGINFO ImageInfo2; 
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LONG lErrorStatus; 

LONG lFifoStatus; 

LONG lOutputFifoStatus; 
5 LONG lFrameCount = 0 ; 

LONG IStatus; 

register LONG *plAddress 1 ; 

register LONG *plAddress2; 

register LONG *p_imageAddress; 
10 long*lPtr; 

float *currentDifference; 

float *previousDifference; 

ULONG ulPCData; 

15 ULONG ulTempO; 

ULONG ulValue; 

ULONG pulPacket[5]; 

long peakRow; 
20 long peakCol; 

long row; 

long col; 

long pixel; 

long index 1; 
25 long index2; 

long index3;« 



int targetCol= 16,targetRow = 16,targetDirection=0; 

30 

/* */ 

/* Initialize some variables. */ 
/* */ 

35 lErrorStatus = PJSUCCESS; 

ulPCData = APPLICATION JtUNNING; 
lFifoStatus = P JEMPTY; 

/* */ 

40 /* Install the ISR found in the file INTERRUP.ASM to interrupt */ 
/* resource IIOF0. Initialize the GJApplUserMasklntCount. */ 
/* */ 

DDF JSRSetllOFO (P_INTERRUPT_USER_MASK, (VOID *) 
45 DBU_App 1 Interrupt); 

GJApplUserMasklntCount = 0; 
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/* */ 

/* Set the ID of the second image for double buffering. */ 
/* Perform a quick grab setup on the input image number. */ 
/* Call Quick Grab with no wait. This application runs the grab */ 
5 /* in continuous mode, the grab will not return until the DMA */. 

/* has started. */ 

/* */ 

if ((IStatus = DBF_SetSecondImageDD (P_DEF AULT_QGS , Hnputlmage2)) 
10 !=P_SUCCESS) 

{ 

ulPCData = END_APPLICATION_REQUEST; 
lErrorStatus = IStatus; 
} /* End if. */ 

15 

if ((IStatus = DBF_QuickGrabSetup (P_DEFAULT_QGS , llnputlmagel)) 
!= P_SUCCESS) 

{ 

ulPCData = END_APPLICATION_REQUEST; 
20 lErrorStatus = IStatus; 

} /* End if. */ 

♦ 

if ((IStatus = DBF_QuickGrab (P_DEFAULT_QGS, P_GRAB_INIT, 
P_GRAB_NO_WAIT)) 
25 != P_SUCCESS) 

{ 

ulPCData = END_APPLICATION_REQUEST; 
lErrorStatus = IStatus; 
} /* End if. */ 

30 



/* */ 

/* Enable hardware interrupts on IIOF0 */ 
/* */ 

35 INT_ENABLE (); 

set_iif_flag (IIOF0_EIIOF); 

/* */ 

/* Initialize pointers to the two image buffers */ 
40 /* */. 

if ((IStatus = DBK_MmtGetImageInfo (llnputlmagel, &ImageInfol)) 
!= P_SUCCESS) 

{ 

45 ulPCData = END_APPLICATION_REQUEST; 

lErrorStatus = IStatus; 
} /* End if. */ 
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if ((IStatus = DBK_MmtGetImageInfo (Hnputlmage2, &ImageInfo2)) 

!= P_SUCCESS) 

{ 

ulPCData = END_APPLICATION_REQUEST; 
lErrorStatus = IStatus; 
} /* End if. */ 

plAddressl = (LONG *) (ImagelnfolPROJMappedAddress); 
plAddress2 = (LONG *) (ImageInfo2.PROJvlappedAddress); 

/* FFT Initialization Stuff */ 

asm(" or 1800h,st"); /* cache enable */ 

/*End FFT Initialization */ 



/* */ 

/* While the input fifo from the PC does not contain */ 
/* any data, continue processing frames and returning */ 

/* the results to the PC. */ 
/* */ 



while (ulPCData != END_APPLIC ATION_REQUEST) { 
if (lFrameCount < (GJApplUserMaskIntCount)){ 

lFrameCount = GJApplUserMasklntCount; 

pJmageAddress = (GJApplUserMasklntCount % 2) 
? plAddressl : plAddress2; 
if (GJApplUserMasklntCount % 2){ 

p JmageAddress=plAddress2;*/ 

currentDifTerence=&complexMatrix2[0][0]; 

previousDifference=&complexMatrixl[0][0]; 

} 

else{ 

p JmageAddress=pl Address 1 ;*/ 
currentDif ference=&complexMatrix 1 [0] [0] ; 
previousDifference=&complexMatrix2 [0] [0] ; 

} 

I************************** Compute FFT on Difference Frame Rows *************/ 

for (row=0;row<FFTSIZE;row++) { 

for (col=0;col<FFTSIZE;col++){ 

lPtr=pJmageAddress+2*col+256*row; 
pixel=*lPtr+*(lPtH-l)+*(lPtr+128)+*(lPtr+129); 
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/♦Compute Difference Frame */ 
blockO[2*colHfloat) (pixel - 

previousFrame[row][col]); 

block0[2*col+l]=0.0; 

/♦Save current frame for next iteration*/ 

previousFrame[row][col]=pixel; 

} 

cfftc(blockO,FFTSIZE,LOGFFTSIZE); 

fPtrl=currentDifference+row*FFTSIZEx2; 
for (index 1 =0;index 1 <FFTSIZEx2 ;index 1 ++) 

fPtrl [indexl]=blockO[indexl]; 

} 

for (col=0;col<FFTSIZE;col++) { 
index3=2*col; 

for (index2=0;index2<FFTSIZEx2;index2=index2+2) { 
blockO[index2]=currentDifference[index3]; 
block0[index2+ 1 ]==currentDifference[index3+ 1 ] ; 
index3+=FFTSIZEx2; 

} 

/"Complete column FFT of difference frame*/ 
cfftc(blockO,FFTSIZE,LOGFFTSIZE); 



index3=2*col; 

for(index2=0;index2<FFTSIZEx2;index2=index2+2){ 

/*Save FFT of difference frame */ 

currentDifference[index3]=blockO[index2]; 

currentDifference[index3+l]==blockO[index2+l]; 

blockO[index2]=currentDifference[index3] 
*previousDifference[indexl ] 
+currentDifference[index3+l] 
*previousDifference[indexl+l]; 

blockO[index2+l]= : currentDifference[index3] 
*previousDifference[indexl +1 ] 
-currentDifference[index3+l] 
*previousDifference[index 1 ] ; 

index3+=FFTSIZEx2; 

} 

/♦Compute inverse FFT for correlation matrix column*/ 
cfflc(blockO,FFTSIZE,LOGFFTSIZE); 

/♦Save to correlation frame*/ 
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fPtr 1 =&correlationMatrix[0] [0] ; 
index3=2*col; 

for (index2=0;index2<FFTSIZEx2;index2=index2+2){ 
5 fPtrl [index3]=block0[index2] ; 

fPtr 1 [index3+ 1 ]=blockO [index2+ 1 ] ; 
index3+=FFTSIZEx2; 

} 

} 

10 conrelationPeak=- 100000000; 

for (row=0;row<FFTSIZE;row++) { 

fPtrl =&correlationMatrix[row] [0] ; 
for (index 1 =0;index 1 <FFTSIZEx2 ;index 1 ++) 
blockO[indexl]=fPtrl [index 1]; 
15 /* Inverse FFT on correlation matrix row */ 

cfftc(block0,FFTSIZE,LOGFFTSIZE); 

fPtr 1 =&correlationMatrix[row] [0] ; 
for (col=0;col<FFTSIZE;col++){ 
20 index l=2*col; 

if (correlationPeak<block0[indexl]) { 

correlationPeak=blockO[indexl ]; 

peakRow=row; 

peakCol=indexl; 



25 } 

fPtr 1 [index 1 ]=block0[index 1 ] ; 

} 

} 

30 

/* */ 

/* Send protocol, lAverage and lFrameCount to the PC. */ 
/* */ 

35 pulPacket[0] = APPLICATION JIUNNING; 

pulPacket[l] = peakCol; 
pulPacket[2] = peakRow; 
pulPacket[3] = (long) (correlationPeak*.0001); 

DDK_PKTSend (P_PACKET__USER_INTERFACE, 



40 pulPacket, 

4 * sizeof (LONG),P_WAITFORCOMPLETE, 
P_PCJNTERRUPT); 
} /* End if. */ 

45 DDK__PKTInterfaceStatus (P_PACKET_USER_INTERFACE, &lFifoStatus, 

&10utputFifoStatus); 
if (lFifoStatus != P_EMPTY){ 
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DDK_PKTRecv (P_PACKET_USER_INTERFACE, 

pulPacket, 

4 * sizeof (LONG),P_WAITFORCOMPLETE, 

P_NO_PC_INTERRUPT); 

ulPCData = pulPacket[0]; 

} /* End if. */ 
} /* End while. */ 



/* */ 

10 /* Disable the IIOFO interrupt. */ 
/* */ 

reset_iif_flag (IIOF0_EIIOF); 
DDFJSRDisablellOFO (); 

15 

/* r */ 

/* Abort the continuous grab. */ 
/* */ 

20 DBF_AbortGrab (); 

DBF_QuickGrabStatus (P_GRAB_WAIT); 

/* . */ 

/* Send back a protocol word indicating the */ 
25 /* last packet of data. (Pad to correct size) */ 
/* */ 



ulValue = APPLICATIONJERMINATED; 
DDK PKTSend (P_P ACKET_USER_INTERFACE, &ulValue, 1 L, 
30 P_WAITFORCOMPLETE, 

P_NO_P CJNTERRUPT) ; 
DDK_PKTSend (P_PACKET_USER_INTERFACE, &lErrorStatus, 1L, 

P_WAITFORCOMPLETE, P_NO_PC_INTERRUPT); 
DDK_PKTSend (P_PACKET_USER_INTERFACE, &lErrorStatus, 1L, 
35 P_WAITFORCOMPLETE,P_PC_INTERRUPT); 

/* */ 

/* Empty USER input FIFO and output FIFO. */ 

/* (Host won't get the END message until */ 

40 /* the output FIFO is empty !) */ 
/* */ 

DDK_PKTInterfaceStatus (P_PACKET_USER_INTERFACE, &lFifoStatus, 
&10utputFifoStatus); 

45 

while ((IFifoStatus != P_EMPTY) || (lOutputFifoStatus != P_EMPTY)){ 
if (lFifoStatus != P_EMPTY){ 
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DDK_PKTRecv (PJPACKETJJSER_INTERFACE, 

&ulPCData, 1L, 

P_WAITFORCOMPLETE, P_NQ__PC^INTERRUPT); . 

} /* End if, */ 

DDK_PKTInterfaceStatus (PJPACKET_USERJNTERFACE, &lFifoStatus, 

&10utputFifoStatus); 

} /* End while. */ 
return PJSUCCESS; 
} /* End of the DBUJJserFunction function. */ 



15 /* 
* 



Copyright (C) 1998 

Video Mouse Group Partnership 



20 /' 



** 

*/ 

// videomouseDlg.cpp : implementation file 

// 

#include "stdafx.h" 
. #include "videomouse.h" 
#include "videomouseDlg.h" 

30 ^include "dpdefs.h" /* XPG definitions and prototypes. */ 

#include "dpptypes.h" /* XPG function prototypes. */ 

include "dberrors.h" /* XPG error codes. */ 

#include "protocol.h" /* Protocol constants define for this application */ 

#include <conio.h> /* getch */ 

35 #include <math.h> 

static int s runDSPLoopThreadProc; 

#ifdef_DEBUG 
40 #define new DEBUG_NEW 
#undef THIS_F1LE 
static char THIS_FILE[] = _FILE_; 
#endif 

45 #defineP_ID_USER_FUNCTIONl 0L 
#defmeP PCOUNT USER FUNCTION 1 2 
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#defme FRAMESIZE 64 

UINT DSPLoopProc(LPVOID pclass) 

{ ~ • " * 

CVideomouseDlg& pcdd = *(reinterpret_cast<CVideomouseDlg*>(pclass)); 
CString dataString; 

int savedCommandMode - DPKJXCCSetCommandType (PJJSER); 
DPKJXCCSetWaitMode (P_NO_WAIT); 

DPK JCCCPushOpcode (P JD JJSER JTJNCTION1 , 
P_PCOUNT_USER - _FUNCTION 1 ); 

DPK_XCCPushLong ((unsigned long) pcdd.m_inputImageNumber2); /* 2 */ 

DPK_XCCPushLong ((unsigned long) pcdd.mJnputlmageNumberl); /* 1 */ 

DPK_XCCSetCommandType (savedCommandMode); 

long status = DPK_XCCCheckStatus (P_PCOUT, P_XCCFIX); 

double framesPerSecond; 

time_t start, finish; 

time( &start ); 

int counter=0; 

if (status=P_SUCCESS) { 

while(srunDSPLoopThreadProc) { 
counter++; 

status = DPK_PKTRecv (P_PACKETJJSER_INTERFACE, (HVOID 

pcdd.m_DSPPacket, 4 * sizeof (LONG), 
P_WAIT_COMPLETE); 

long protocol = pcdd.mJDSPPacket[0]; 
dataString.Format( ,, %d n ,pcdd.m_DSPPacket[l]); 
pcdd.m_average.SetWindowText(dataString); 
dataString.Format("%d ,, ,pcdd.m_DSPPacket[2]); 
pcdd.m_frameNumber.SetWindowText(dataString); 

time( &finish ); 

double elapsedTime = difftime( finish, start ); 
framesPerSecond=(double) counter/ (double)elapsedTime; 
long max = pcdd.m_DSPPacket[3]; 

if (max>0.0){ 

double detectx=(double) pcdd.m_DSPPacket[2]; 
double detecty=0.5*(double)pcdd.m_DSPPacket[l]; 

if (detectx > 31) detectx =detectx-FRAMESIZE; 
if (detecty > 31) detecty = FRAMESIZE-detecty; 
else detecty = -detecty; 
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// double multiplier;/ 

// if (detectx<2) multiplier^ .0; 

// else if (detectx<10) multiplier = exp((detectx-2.0)/2.5); 

// else multiplier^); 

// detectx*=multiplier; 

// if (detecty<2) multiplier=l .0; 

// else if (detecty<l 0) multiplier = exp((detecty-2.0)/2.5); 

// else multiplier^; 

// detecty*=multiplier; 



static POINT ptCursor; 
GetCursorPos(&ptCursor) ; 

ptCursor.x-=(long) detectx; 
ptCursor.y-=(long) detecty; 

SetCursorPos(ptCursor.x,ptCursor.y); 

} 

} 

savedCommandMode = DPK_XCCSetCommandType 

(savedCommandMode); 

DPKXCCSetWaitMode (P_WAIT_COMPLETE); 

DPK_EndPCK (); 
AfxMessageBox("Exited Thread"); 

} 

else{ 

s_runDSPLoopThreadProc=false; 
savedCommandMode = DPK_XCCSetCommandType 
(savedCommandMode) ; 

DPK__XCCSetWaitMode (P_^WAIT_COMPLETE); 

DPKEndPCK (); 
AfxMessageBox("Exited Thread"); 

} 

time( &finish ); 

double elapsedTime = difftime( finish, start ); 
framesPerSecond=(double) counter/ (double)elapsedTime; 

return 1; 

} 

///////////////////////////////////////////////////////////////////////////// 
II CAboutDlg dialog used for App About 

class CAboutDlg : public CDialog 
{ 

public: 

CAboutDlgO; 
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// Dialog Data 

//{ (AFX_DATA(CAboutDlg) 
enum { EDD = IDD_ABOUTBOX } ; 
//}}AFX_DATA 

// ClassWizard generated virtual function overrides 

//{ (AFX_VIRTUAL(CAboutDlg) 

protected: 

virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support 
//}}AFX_VIRTUAL 

// Implementation 

protected: 

// { { AFX_MSG(CAboutDlg) 

//}}AFX_MSG 

DECLARE_MESSAGE_MAP() 

}; 

CAboutDlg::CAboutDlg() : CDialog(CAboutDlg::IDD) 
{ 

// { { AFX_D ATA_INIT(CAboutDlg) 
//} } AFX_DATA_INIT 

.} 

void C AboutDlg: :DoDataExchange(CDataExchange* pDX) 
{ 

CDialog: :DoDataExchange(pDX); 
// { { AFX_DATA_M AP(CAboutDlg) 
//} } AFX_DATA_MAP 

} 

BEGIN_MESSAGE_MAP(CAboutDlg, CDialog) 

//{ {AFX_MSG_MAP(CAboutDlg) 
// No message handlers 

//}}AFX_MSG_MAP 
END_MESSAGE_MAP() 

HHUUimiumuiiuiuimnuuiiHuiuuiiiiiiiiiiiiiiiiiiiiiiiii 

II CVideomouseDlg dialog 

CVideomouseDlg::CVideomouseDlg(CWnd* pParent /*=NULL*/) 
: CDialog(CVideomouseDlg::IDD, pParent) 

{ 

// { { AFX_DATA_INIT(CVideomouseDlg) 
//} } AFX_DATA_INIT 

// Note that Loadlcon does not require a subsequent Destroylcon in Win32 
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m_hIcon = AfxGet App()->LoadIcon(IDR_MAINFRAME) ; 

} 

void CVideomouseDlg::DoDataExchange(CDataExchange* pDX) 
{ 

CDialog: :DoDataExchange(pDX); 
//{ (AFX_DATA_MAP(CVideomouseDlg) 
DDX_Control(pDX, IDCFRAMENUMBER, m_frameNumber); 
DDX_Control(pDX, IDC_AVERAGE, m_average); 
//} } AFX_DATA_MAP 

} 

BEGIN_MESSAGE_MAP(CVideomouseDlg, CDialog) 
// { { AFX_MSG_MAP(CVideomouseDlg) 
ON_WM_SYSCOMMAND() 
ON_WM_PAINT() 

ON_WM_QUERYDRAGICON0 
ON_BN_CLICKED(IDC_ENABLE, OnEnable) • 
ON_BN_CLICKED(IDC_STOP, OnStop) 
//}}AFX_MSG_MAP 
ENDMESSAGEMAPO 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
II CVideomouseDlg message handlers 

BOOLCVideomouseDlg::OnInitDialog() 

{ 

CDialog: :OnInitDialog(); 
// Add "About..." menu item to system menu. 

// IDM_ABOUTBOX must be in the system command range. 
ASSERT((IDM_ABOUTBOX & OxFFFO) = IDM_ABOUTBOX); 
ASSERT(IDM_ABOUTBOX < OxFOOO); 

i 

CMenu* pSysMenu = GetSystemMenu(FALSE); 
if(pSysMenu !=NULL) 

{ 

CString strAboutMenu; 

strAboutMenuXoadString(IDS_ABOUTBOX); 
if (istrAboutMenu.IsEmptyO) 

{ 

pSysMenu->AppendMenu(MF_SEPARATOR); 
pSysMenu->AppendMenu(MF_STRING, IDM_ABOUTBOX, 

strAboutMenu); 

} 

} 
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// Set the icon for this dialog. The framework does this automatically 
// when the application's main window is not a dialog 
SetIcon(m_hIcon, TRUE); // Set big icon 

SetIcon(m_hIcon, FALSE); // Set small icon 

InitializeFrameGrabber(); 

return TRUE; // return TRUE unless you set the focus to a control 



10 } 



void CVideomouseDlg::OnSysCommand(UINT nID, LP ARAM lParam) 

{ 

. if ((nID & OxFFFO) = IDM_ABOUTBOX) 
15 { 

CAboutDlg dlgAbout; 
dlgAbout.DoModal(); 

} 

else 
20 { 

CDi alog : : OnSy sCommand(nID , IP aram) ; 

} * 

} 

25 // If you add a minimize button to your dialog, you will need the code below 
// to draw the icon. For MFC applications using the document/view model, 
// this is automatically done for you by the framework. 

void CVideomouseDlg::OnPaint() 
30 { 

if (IsIconic()) 
{ 

CPaintDC dc(this); // device context for painting 
35 SendMessage(WMJCONERASEBKGND, (WPARAM) dc.GetSafeHdc(), 0); 

// Center icon in client rectangle 
int cxlcon = GetSystemMetrics(SM__CXICON); 
int cylcon = GetSystemMetrics(SM^CYICON); 
40 CRect rect; 

GetClientRect(&rect); 

int x = (rect.Width() - cxlcon + 1) / 2; 

int y = (rect.HeightQ - cylcon + 1) / 2; 

45 // Draw the icon 

dc.DrawIcon(x, y, mhlcon); 

} 
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else 
{ 

CDialog::OnPaint(); 

} 

5 } 

// The system calls this to obtain the cursor to display while the user drags 
// the minimized window. 

HCURSOR CVideomouseDlg::OnQueryDragIcon() 
10 { 

return (HCURSOR) m_hIcon; 

} 

void CVideomouseDlg::OnEnable() 
15 { 

srunDSPLoopThreadProc = true; 

m_pDSPLoopThread = AfxBeginThread (DSPLoopProc, this); 

} 



void CVideomouseDlg::OnStop() 
{ 

if(s_runDSPLoopThreadProc) { 
25 m_DSPPacket[0] = END_APPLICATION_REQUEST; 

DPK PKTSend (P_PACKET_USER_INTERFACE, m_DSPPacket,4 * sizeof 

(LONG), 

P_WAIT_COMPLETE); 
s_runDSPLoopThreadProc=false; 

30 } 
} 

#define INIT_FAILURE 1 

long CVideomouseDlg: :InitializeFrameGrabberO 

{ 

35 DPKJnitPCK(l); 

if ((m_status=DPK_InitXPG (0, P_TFB_RELOAD_COFF_FILE | 

P_IFB_CHECK_REVISION,"videoMouse.out")) != P_SUCCESS){ 
DPK_EndPCK (); 

40 m_errorMessage.Format("Error initializing FPG, status = %ld.\n", m_status); 

AfxMessageBox(m_errorMessage); 
return INIT_FAILURE; 

} 

DPK_XCCSetWaitMode (P_WAIT_COMPLETE); 
45 long cpsNumber = DPF_LoadCPF("vidmouse.cpf '); 

if (cpsNumber != P_SUCCESS) { 

m_errorMessage.Format("Error loading a CPF, status = %ld.\n", m_status); 
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AfxMessageBox(m_errorMessage); 
DPK_EndPCK (); 
return INIT_FAILURE; 

} 

5 if ((mjstatus = DBF_SelectCPS (cpsNumber)) < P_SUCCESS) { 

m_errorMessage.Format("Error selecting a CPS, status = %ld.\n", m_status); 
AficMessageBox(m_errorMessage); 
DPK_EndPCK (); 
return INITFAILURE; 

10 } 

mj>tatus=DBF_SetGrabWindow(PJWAULT_QGS, 256,128,176,128); 

if ((m_status = DBF_GetGrab Window (P_DEF AULT_QGS , &m_startCol, 
15 &m_numCols, 

&m_startRow, &m_numRows)) !=P_SUCCESS){ 

m_errorMessage.Format( M Error DBF_GetGrab Window: %ld.\n", m_status); 
AfxMessageBox(m_errorMessage); 
DPKJEndPCK (); 
20 return INITJFAILURE; 

} 

if ((m_status = DBK_MmtCreateImage (m_numCols, mjumRows, 

P_DATA_SIZE_BYTE, 
25 PDATA JTYPE JNTEGER, 2, &m JnputlmageNumberl , 

&m_nurnberImagesCreated)) 

!=P_SUCCESS){ 

m_errorMessage.Format("Error creating the input image, status = %ld.\n", 

m_status); 

30 AfxMessageBox(m_errorMessage); 

DPKJEndPCK (); 
return INITJFAILURE; 

} 

m_inputImageNumber2 = mjnputlmageNumberl + 1; 

35 

return 0; 

} 



40 It should be understood that the following claims are to cover all generic and 

specific features of the invention described herein, and all statements of the scope of 
the invention which, as a matter of language, might be said to fall there between. 



Having described the invention, what is claimed is: 
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1. A human motion following controller for augmenting motion of items shown 
on a computer display, the display being coupled to a computer of the type which 
controls positioning of the items through operating system controls, comprising: 

5 a camera for capturing frames of data corresponding to a first image of at least part 
of a user at the computer display; 

signal processing means coupled to the camera for (a) detecting differences between 
successive frames of data corresponding to motion of the first image, and (b) 
10 communicating differences information to the computer to reposition display of the 
items through operating system controls, the items being repositioned on the display 
by an amount corresponding to the motion of first image. 

2. A controller of claim 1, wherein the items comprise a computer cursor. 

15 

3. A controller of claim 1, wherein the items comprise a scene view. 

4. A controller of claim 1, further comprising a PC card for installation within 
the computer and for communication on a computer bus, the signal processing 

20 means being substantially resident with the PC card for communicating differences 
information to the bus. 

5. A controller of claim 1, wherein the camera comprises means for capturing 
augmented frames of data corresponding to a second image of part of the user at the 

25 computer display, the signal processing means further comprising means for 
detecting differences between successive augmented frames of data corresponding 
to motion of the second image and for communicating augmented difference 
information to the computer to reposition display of the items through operating 
system controls, the items being repositioned on the display by an amount 

30 corresponding to motion of the first and second images. 
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6. A controller of claim 1, further comprising frame difference electronics for 
storing and subsequently subtracting pixel-by-pixel difference data. 

■ 

7. A controller of claim 6, wherein the difference electronics comprise multiple 
5 frame memory, a subtraction circuit, and a state machine controller/ memory 

addresser to control data flow. 

8. A controller of claim 1, further comprising N frame video memory for storing 
frames of image data. 

10 

9. A controller of claim 1, further comprising a DSP for implementing select 
algorithms on difference frames or raw frames of image data, 

10. A controller of claim 9, further comprising memory selected from the group 
15 of EPROM and RAM. 

11. A controller of claim 9, further comprising means for interfacing the DSP to a 
PCI bus in the computer. 

20 12. A controller of claim 1, further comprising MPEG compression electronics for 
compressing video for the computer. 

13. A controller of claim 1, wherein the signal processing means comprises frame 
differencing means for removing unchanged information from image frames. 

25 

14. A controller of claim 1, wherein the signal processing means comprises frame 
memory to buffer one or more image frames. 

15. A controller of claim 14, wherein the signal processing means comprises a 
30 frame differencer for reading a delayed frame from the frame memory and for 

subtracting the delayed frame from a current image frame. 
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♦ 

16. A controller of claim 1, wherein the signal processing means comprises 
correlation means for determining row and column shifts corresponding to 
differences between a current image frame and a delayed image frame. 

5 17. A controller of claim 16, wherein the signal processing means comprises best 
fit algorithm means for minimizing the shifts to provide alignment. 

18. A controller of claim 17, wherein the best fit algorithm means utilizes a peak 
detect algorithm. 

10 

19. A controller of claim 1, wherein the signal processing means comprises video 

■ 

cursor control for enabling and alternatively disabling cursor control. 

20. A controller of claim 19, wherein the video cursor control comprises means 
15 responsive to keystrokes at the computer. 

■ 

21. A controller of claim 19, wherein the video cursor control comprises means 
responsive to a blink of an eye of the user. 

20 22. A controller of claim 19, wherein the video cursor control comprises means 
responsive to sound generated by the user. 

23. A controller of claim 22, further comprising a microphone to detect the sound. 

25 24. A controller of claim 1, wherein the signal processing means comprises a 
complex multiplier for providing a two dimensional inverse FFT operation. 

25. A controller of claim 24, wherein the signal processing means comprises a 
peak detect for determining a shift associated with aligning difference images. 

30 
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» 

26. A controller of claim 1, wherein the signal processing means comprises FFT 
means for providing a two dimensional FFT of image data. 

27. A controller of claim 1, wherein the signal processing means comprises means 
5 for identifying parts of the user, the parts being selected from the group of a hand, 

elbow, head, neck, ears, and forehead. 

28. A controller of claim 1, wherein the signal processing means comprises means 
for detecting left and right movement of a head of the user and for shifting the items 

10 in response to the left and right movement. 

29. A controller of claim 1, wherein the signal processing means comprises means 
for detecting rotational movement of a head of the user and for rotating the items in 
response to the left and right movement. 

15 

29. A controller of claim 1, wherein the signal processing means comprises means 
for repositioning the items, if appropriate, at approximately every l/30 th of a second. 

30. A controller of claim 1, wherein the signal processing means comprises means 
20 for repositioning the items at a selected magnification as compared to actual 

movement of the user. 

31. A controller of claim 1, wherein the signal processing means comprises means 
for storing image data of a head of the user at various orientations relative to the 

25 camera and for correlating image data to the stored image data to define head 
orientation, the head orientation being used to reposition the items. 

32. A controller of claim 1, wherein the signal processing means comprises IR 
means for detecting heat associated with the user and for repositioning the items at a 

30 rate correlated to the heat. 
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33. A controller of claim 32, wherein the items correspond to computer gaming 
display images. 

34. A controller of claim 32, wherein the heat corresponds to user stress. 

5 

35. A controller of claim 1, further comprising at least one other camera arranged 
to take images of at least a second part of the user. 

36. A controller of claim 35, wherein the one other camera takes image data in a 
10 second electromagnetic spectrum. 

37. A controller of claim 1, wherein the camera comprises a DSP. 

38. A controller of claim 37, wherein the DSP processes difference information for 
15 the computer. 

39. A controller of claim 1, wherein the signal processing means comprises a CPU 
within the computer. 

20 40. A controller of claim 1, wherein the signal processing means comprises means 
for processing multiple image zones in frames of image data and for repositioning 
the items according to characteristics between zones. 

41. A controller of claim 40, wherein one zone comprises image data 
25 corresponding to at least one eye of the user. 

42. A controller of claim 41, wherein the signal processing means comprises 
means for determining a blink of the eye. 

30 43. A controller of claim 42, wherein the signal processing means comprises 
means for disabling and alternatively enabling cursor control based upon the blink. 
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44. A controller of claim 1, wherein the signal processing means comprises means 
for processing image data to determine motion of at least one eye of the user and for 
repositioning the items based upon the motion. 

5 45. A controller of claim 1, wherein the signal processing means comprises means 
for processing image data to determine motion of a pupil of at least one eye of the 
user and for repositioning the items based upon the motion. 

46. A controller of claim 1, wherein the camera comprises a zoom attachment for 
10 automatically zooming into a desired magnification of at least one eye of the user. 

47. A controller of claim 1, wherein the camera comprises zoom means for 
automatically focusing on the user as the user moves in distance from the camera. 

15 48. A controller of claim 47, wherein the signal processing means comprises 
means for enlarging or shrinking the items on the display in response to focusing by 
the zoom means. 

49. A controller of claim 1, wherein the signal processing means comprises means 
20 for determining edges of a head of the user and for repositioning the items in 

response to movements of the edges. 

50. A controller of claim 1, wherein the signal processing means comprises means " 
for isolating one or more objects held by the user and for repositioning the items in 

25 response to movement of the objects. 

51. A controller of claim 1, wherein the signal processing means comprises means 
for isolating one or more parts of the user and for repositioning the items in response 
to movement of the parts. 

30 

64 



10/26/2006, EAST Version: 2.1,0.14 



WO 99/35633 



PCT/US99/00086 



52. A controller of claim 1, wherein the parts comprise at least one of a hand, 
head, and a foot. 

53. A controller of claim 1, wherein the signal processing means comprises means 
5 for isolating one or more symbols associated with the user and for repositioning the 

items in response to movement of the symbols. 

54. A controller of claim 1, further comprising a second camera constructed and 
arranged for viewing the user from above, the signal processing means having 

10 means for repositioning the items in response to movement detected from images in 
the second camera. 

55. A controller of claim 54, wherein signal processing means comprises means 
for repositioning the items in response to forward and backward movement of the 

15 user as detected by the second camera. 

56. A controller of claim 1, further comprising re-calibration means connected 
with the signal processing means for repositioning the items to an original position 
in response to a re-calibration event. 

20 

57. A controller of claim 1, wherein the re-calibration means comprises a 
microphone and the event comprises a sound generated by the user. 

58. A controller of claim 1, wherein the camera comprises the re-calibration 
25 means for detecting a blink of the user. 

59. A system for controlling a computer, comprising: 

a transducer for converting optical signals to electrical signals; 
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signal processor means for detecting motion in the digital data and providing a 
digital representation of said motion; 

communication means for entering one or more of the electronic signals, digital data, 
5 and digital representation into the computer to manipulate a computer display in 
response to the motion. 

» * 

60. A system of claim 59, wherein the computer comprises the signal processor 
means. 

61. A system of claim 59, wherein said signal processor comprises a digital signal 
10 processor separate from a CPU within the computer. 

62. A system of claim 59, wherein the transducer, electronic means and signal 
processor are constructed and arranged into a single device in communication with 
the computer. 

63. A system of claim 59, wherein the communication means comprises one of 
15 RS170 video, a PCI bus interface, a digital computer interface, a serial computer 

interface. 

64. A system of claim 59, further comprising means for repositioning a computer 
cursor in response to the motion. 

65. A system of claim 59, wherein the transducer comprises one or more of a 
20 visible CCD camera and an IR camera. 
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66. A system of claim 59, wherein the transducer comprises a CCD camera 
having at least 2x2 imaging pixels. 

67. A system of claim 66, wherein the camera comprises optics with various fields 
of view. 

5 68. A system of claim 59, wherein the transducer comprises one of a CCD or a 
CMOS integrated circuit with digital outputs. 

69. A system of claim 68, wherein the transducer generates RSI 70 ouput. 

70. A system of claim 68, wherein the transducer generates RS170 digital output. 

71. A system of claim 68, wherein the transducer generates digital resolutions of 4 
10 bits or greater 

72. A system of claim 59, wherein the signal processor comprises a video frame 
memory. 

73. A system of claim 59, wherein the signal processor comprises frame 
difference functionality, 

15 74. A system of claim 59, wherein the signal processor comprises video frame 
difference memory. 

75. A system of claim 59, wherein the signal processor comprises correlation 
functionality. 

76. A system of claim 59, wherein the signal processor comprises means for 
20 determining best fit motion. 
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77. A system of claim 59, further comprising means for controlling cursor 
movement. 



78. A system of claim 59, further comprising means for segmenting video images 
to provide multiple digital representations of the motion corresponding to different 
portions of the digital representation. 

79. A system of claim 78, wherein the optical signals are generated through image 

* 

acquisition of a portion of a human. 

80. A system of claim 78, wherein the optical signals are generated by viewing 
multiple features of a human. 

81. A system of claim 59, further comprising neural net means for learning user 
motion over time. 
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